PROJECT SUMMARY The mapping of the human genome and genome wide association studies have provided great insights in our understanding of the genetic etiology of hereditary diseases; however, critical gaps remain. A type of genetic variations that has been difficult to detect in genomic studies has been Structural Variants (SVs), disruptions involving more than 50 base pairs. SVs have been implicated in a lot of inherited diseases and cancers, yet their detection remains challenging with conventional DNA sequencing methods. Developments in third- generation sequencing (linked-read and long-read sequencing) and single-cell RNA sequencing (scRNA-seq) provide an opportunity to greatly improve the detection of SVs and Copy Number Variations (CNVs), one common type of SVs. However, existing computational tools do not fully take advantage of the potential and the opportunities that these technologies offer. In this project, drawing from our unique expertise in this rapidly evolving area, we propose the development of a new generation of tools that will improve greatly the detection and phasing of SVs from a large population of samples. We will develop computational tools to generate a high-quality diploid assembly from each individual and to combine data from large populations of controls and patients to characterize SVs that confer risk for any particular disease. We will further design a haplotype- based linkage disequilibrium (LD) mapping approach at the whole genome scale to identify unique sharing haplotype patterns and provide a new perspective for complex disease studies. Detecting SVs in combination with small variants will further allow us to explain the etiology of complex diseases. We will also develop algorithms to detect CNVs from scRNA-seq datasets, which have application in cancer studies. Successful completion of this project will constitute a major step forward in uncovering the genetic cause of complex diseases and cancers.