# Semiparametric Analysis of Big Censored Data

> **NIH NIH R01** · UNIV OF NORTH CAROLINA CHAPEL HILL · 2020 · $467,993

## Abstract

Project Summary
The broad, long-term objectives of this project are to develop semiparametric regression methods for analyzing
censored data, which are commonly encountered in biomedical research on chronic diseases. This renewal
application is focused on addressing the computational challenges in the analysis of big data involving hun-
dreds of thousands to tens of millions of individuals with thousands to tens of millions of variables. The speciﬁc
aims are to develop: (1) a communication-efﬁcient, distributed boosting algorithm based on semiparametric efﬁ-
cient score functions for ﬁtting the Cox proportional hazards model to a wide variety of big censored data; (2) a
communication-efﬁcient, distributed boosting algorithm that embeds a random feature-set selection scheme into
variable selection in high-dimensional settings; (3) a communication-efﬁcient, distributed boosting algorithm for
ﬁtting a Cox model with latent factors to multiple types of high-dimensional features with missing values; and (4)
a distributed EM algorithm that incorporates both the preconditioned conjugate-gradient method for matrix inver-
sion and a novel modiﬁcation of the Laplace approximation to numerical integration for ﬁtting a random-effect Cox
model with a large number of genetically related individuals. Each of these aims addresses important new chal-
lenges arising from today's big biomedical studies. The proposed methods and algorithms are based on likelihood
and other sound statistical principles. The desired asymptotic properties of the estimators will be established rig-
orously through innovative use of modern empirical process theory and other advanced mathematical tools. The
proposed methods and algorithms will be evaluated extensively through simulation studies mimicking real data
and tested in the cloud computing environment, which provides high data security guarantees and scalable com-
puting infrastructures. In addition, the methods and algorithms will be applied to our ongoing biomedical studies,
including the NHLBI Trans-Omics for Precision Medicine program and the UK Biobank. Finally, efﬁcient, reliable,
and user-friendly open-source software with proper documentation will be produced. The overall impact of the
proposed work will be to create new paradigms for survival analysis, advance biomedical research in the United
States and other countries, and accelerate the search for effective strategies to prevent and treat cardiovascular
diseases, cancers, AIDS, and other diseases of utmost importance to global public health.

## Key facts

- **NIH application ID:** 9966371
- **Project number:** 2R01HL149683-29A1
- **Recipient organization:** UNIV OF NORTH CAROLINA CHAPEL HILL
- **Principal Investigator:** DANYU LIN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $467,993
- **Award type:** 2
- **Project period:** 2020-04-21 → 2024-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9966371

## Citation

> US National Institutes of Health, RePORTER application 9966371, Semiparametric Analysis of Big Censored Data (2R01HL149683-29A1). Retrieved via AI Analytics 2026-06-11 from https://api.ai-analytics.org/grant/nih/9966371. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
