Children’s Interstitial and Diffuse Lung Diseases (chILD) are disorders of the lung tissue that present in infancy and early childhood. ChILD pose major burdens on thousands of children across the U.S., on their families, and on the healthcare system more broadly. For most, the chILD-causing genes have not yet been identified and current treatments are non-specific and are all too often ineffective. The long-term goal of this project is to define the genetic landscape of chILD to inform the development of more specific and biologically rational therapies. Our approach is informed by our implementation of two complementary approaches: (i) whole-exome sequencing of patients seen in our chILD ambulatory clinics, which has to date established a genetic diagnosis in 20% of cases, including discovery of 6 novel candidate chILD genes; and (ii) in situ gene expression profiling of lung biopsy specimens, which we find differentiates samples with similar chILD histopathology based on their underlying genetic causes, suggesting that patients with similar profiles may have forms of chILD caused by shared molecular factors. In Aim 1, we will expand on these observations by profiling 300 available chILD lung biopsy samples to define patient groups (clusters) with similar molecular profiles. A chILD molecular classifier will be developed using 50 samples with definitive diagnoses and then validated in an independent set of 50 samples. The model will then be applied to 200 additional samples from patients lacking a molecular diagnosis and assess the biologic significance of each cluster using gene set enrichment and rare variant genetic analyses. We will assess clinical significance of cluster membership by testing for differences in clinical outcomes across clusters. In Aim 2, we will analyze whole exome sequence data from a cohort of 400 children with unexplained childhood parenchymal lung disease using family- and population-based methods to map new ChILD-causing genes. To improve power, we apply integrative genomic and network-based analytic approach that incorporates in situ transcriptomic data from Aim 1. We will replicate findings in the ChILD Research Network (ChILDRN) registry and will evaluate the functional impact of candidate variants on target protein expression using patient- derived biobanked samples. In Aim 3, to better understand how newly discovered ChILD genes contribute to disease pathogenesis, and to characterize the spectrum of their clinical presentations, we will identify patients harboring rare genetic variation in 6 novel chILD genes by screening the Genomic Information Commons (GIC) database – an NIH-supported, federated infrastructure facilitating cross-institutional query of genotype– phenotype databases at leading U.S. children’s hospitals. Children with these variants will be invited with their parents to undergo standardized phenotypic assessments (clinical evaluations, biochemical analyses, and molecular profiling) to characteriz...