Semiparametric Analysis of Big Censored Data

NIH RePORTER · NIH · R01 · $467,993 · view on reporter.nih.gov ↗

Abstract

Project Summary The broad, long-term objectives of this project are to develop semiparametric regression methods for analyzing censored data, which are commonly encountered in biomedical research on chronic diseases. This renewal application is focused on addressing the computational challenges in the analysis of big data involving hun- dreds of thousands to tens of millions of individuals with thousands to tens of millions of variables. The specific aims are to develop: (1) a communication-efficient, distributed boosting algorithm based on semiparametric effi- cient score functions for fitting the Cox proportional hazards model to a wide variety of big censored data; (2) a communication-efficient, distributed boosting algorithm that embeds a random feature-set selection scheme into variable selection in high-dimensional settings; (3) a communication-efficient, distributed boosting algorithm for fitting a Cox model with latent factors to multiple types of high-dimensional features with missing values; and (4) a distributed EM algorithm that incorporates both the preconditioned conjugate-gradient method for matrix inver- sion and a novel modification of the Laplace approximation to numerical integration for fitting a random-effect Cox model with a large number of genetically related individuals. Each of these aims addresses important new chal- lenges arising from today's big biomedical studies. The proposed methods and algorithms are based on likelihood and other sound statistical principles. The desired asymptotic properties of the estimators will be established rig- orously through innovative use of modern empirical process theory and other advanced mathematical tools. The proposed methods and algorithms will be evaluated extensively through simulation studies mimicking real data and tested in the cloud computing environment, which provides high data security guarantees and scalable com- puting infrastructures. In addition, the methods and algorithms will be applied to our ongoing biomedical studies, including the NHLBI Trans-Omics for Precision Medicine program and the UK Biobank. Finally, efficient, reliable, and user-friendly open-source software with proper documentation will be produced. The overall impact of the proposed work will be to create new paradigms for survival analysis, advance biomedical research in the United States and other countries, and accelerate the search for effective strategies to prevent and treat cardiovascular diseases, cancers, AIDS, and other diseases of utmost importance to global public health.

Key facts

NIH application ID
9966371
Project number
2R01HL149683-29A1
Recipient
UNIV OF NORTH CAROLINA CHAPEL HILL
Principal Investigator
DANYU LIN
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$467,993
Award type
2
Project period
2020-04-21 → 2024-03-31