Curation and analysis of publicly available, molecular profiles from people with Down Syndrome

NIH RePORTER · NIH · R03 · $155,205 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Due to mandates from funding agencies and publishers, high-throughput, molecular data from Down syndrome individuals and controls (mostly humans and mice) are available in public repositories. Researchers can use such data to corroborate their own ﬁndings and pose new research questions. Doing so would help to leverage prior investments and complement efforts by the INCLUDE Data Coordinating Center (DCC) to generate data for new cohorts. Our proposal focuses speciﬁcally on mRNA expression and DNA methylation data. These data types shed light on how genes are regulated, how molecular aberrations lead to medical conditions, and how medical outcomes can be predicted, potentially leading to improved diagnostics, treatments, and insights into human health and disease. However, many data-generation platforms are used for these data types, and researchers use a wide range of techniques for normalizing the data, checking data quality (if they check at all), and mapping to gene annotations. To reuse the data most effectively, the data must be reprocessed from its original form; normalized and quality checked consistently; and mapped to current annotations. Agencies who manage public repositories lack resources and expertise to perform these steps. In our ﬁrst aim, we will address this problem using a data-curation approach. We have identiﬁed 148 datasets speciﬁc to Down Syndrome that we believe should be prioritized for reuse. Using our expertise in molecular-data processing and bioinformatics, we will re-normalize, quality-check, summarize, and annotate the data using an approach that maximizes consistency for all of the datasets. Additionally, we will map the metadata to biomedical-ontology terms in collaboration with the INCLUDE DCC. We expect that these efforts will reduce barriers for researchers in the Down syndrome community to reuse the data and accelerate progress in the ﬁeld. Our second aim focuses on interoperability. For many research questions, a single dataset is insufﬁcient. Sample sizes may be small and/or a single dataset may not represent the range of phenotypes or other factors necessary to answer a given question. Therefore, it is often crucial to integrate datasets from multiple sources. However, systematic differences between datasets are inevitable due to differences in populations, laboratory conditions, and environmental factors. Failing to adjust for these differences will likely lead to biased conclusions. We will evaluate the feasibility of using generative neural networks, a type of algorithm that is highly conﬁgurable and is behind many of the most inﬂuential artiﬁcial-intelligence advances of the past decade. We will apply these algorithms in the context of studying medical conditions that co-occur with DS, such as autoimmune conditions, dementia-related disease, congenital heart defects, and leukemias. Our algorithms will search for systematic patterns that differ between datasets and generate a modiﬁed vers...

Key facts

NIH application ID: 10878335
Project number: 1R03HL168983-01A1
Recipient: BRIGHAM YOUNG UNIVERSITY
Principal Investigator: Stephen Piccolo
Activity code: R03
Funding institute: NIH
Fiscal year: 2024
Award amount: $155,205
Award type: 1
Project period: 2024-06-01 → 2026-05-31