Protein Signatures of APOE2 and Cognitive Aging

NIH RePORTER · NIH · R01 · $322,757 · view on reporter.nih.gov ↗

Abstract

Improving AI/ML readiness of data generated under the R01: Protein signatures of APOE2 and AG061844 “Protein signatures of APOE2 and cognitive aging”, we are generating proteomic and metabolomics data in a cohort of centenarians, their offspring, and unrelated controls from the New England Centenarian Study (NECS). Study participants have been characterized with detailed medical history, genetic profiles, and longitudinal assessment of physical and cognitive functions. The goal of the parent R01 is to validate a proteomic signature of APOE genotypes, and to evaluate its value together with metabolic profiles to predict patterns of cognitive function change in aging individuals. We plan to share data through the Alzheimer’s disease (AD) portal, and the new extreme longevity (EL) portal that is currently under development. Sharing the data in an unrestricted manner is not possible because they include HIPAA identifiers, particularly age >89. Unrestricted sharing of data would be an attractive option for AI/ML investigators, and the goal of this request for administrative supplement is to cognitive aging. Funded by the NIA: R01 use advanced machine learning techniques to generate high-fidelity, privacy-preserving, synthetic versions of the data obtained in the parent achine learning methods have emerged that can be used to generate synthetic data using a model that is trained in the real data. This model can be used to generate a synthetic data set in which no single data point corresponds to a real person in the original data set, but the synthetic data can be analyzed to produce results that are like those derived from the original data. This approach has received substantial attention in the past few years, and it has been adopted to compromise between data sharing and privacy, including generation of synthetic data for the National COVID Cohort Collaborative (N3C). We have put together a team of data scientists and partners from the company Syntegra R01 so they can be shared without restriction. M , to generate and validate a synthetic data set that matches the data generated with the parent R01. Our proposal is structured in three aims. In Aim 1, we will share with Syntegra real data from the NECS that include proteomics and metabolomics, genetic variables and patients’ characteristics including assessment of cognitive function. This real data will be used to train the data generation model and create synthetic data sets. In Aim 2 we will d evelop a protocol for validation of the synthetic data sets that includes fidelity to a variety of results of machine learning analyses and metrics to assess the deidentification of data. In Aim 3 we will conduct the analysis in the real and synthetic data sets and compare the results. Impact. This is a high risk, but potentially high return proposal. If the approach works, we will be able to generate data that can be widely shared with the community. The approach will also be applicable to several other stu...

Key facts

NIH application ID
10408304
Project number
3R01AG061844-04S1
Recipient
TUFTS MEDICAL CENTER
Principal Investigator
THOMAS T PERLS
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$322,757
Award type
3
Project period
2018-09-30 → 2023-05-31