PROJECT SUMMARY Technological advances in sequencing and experimental assays have greatly increased the availability of vari- ous kinds of genomic data, enabling us to catalog genetic and epigenetic variation in diverse populations, and to probe fundamental biological processes (e.g., transcription and translation) in unprecedented detail. This de- velopment is providing a number of new opportunities for basic and biomedical research, but often the data are noisy and multifaceted, while the underlying biology is very complex, thus presenting both theoretical and com- putational challenges for analysis and interpretation. New efficient and robust statistical inference tools, as well as theoretical analysis of mathematical models, are much in need of development to bring the promise of the big data era in biology to full fruition. The central goal of the parent project (R35-GM134922) is to develop a suite of useful statistical and computational tools that will help to tackle this challenge, by enabling inference under complex models and helping researchers integrate information from different types of data to reveal fundamental biological processes. In particular, the parent project aims to achieve the following goals: (1) Improve and widen deep learning/neural network applications in genomics. (2) Leverage cutting-edge techniques in natural lan- guage processing (NLP) and massive protein databases to improve biological sequence representations, which can facilitate downstream prediction tasks. (3) Develop novel computational methods for integrative analysis of genomic data. The proposed diversity supplement will train and mentor an underrepresented minority student through research projects that will help to achieve the above specific objectives of the parent grant.