DESCRIPTION (provided by applicant): Advances in information and high-throughput technologies have set the stage for the 'big data age' in biomedicine. However, there remain unresolved challenges that could limit the impact of big data exploration in the basic, clinical an biomedical sciences. These challenges range from assuring privacy and security in cloud computing environments to establishing the integrity and reproducibility of quantitative analysis tools to proving the validity and generalizability of common probabilistic frameworks used to interpret big data in a biomedical context. Our community can prepare for these challenges by developing a workforce that studies data as a science and engineers scalable technologies. Vanderbilt University is uniquely positioned to establish such a program and train the next generation of brightest minds in data science. The proposed program lays a foundation in, and emphasizes the symbiotic relationship between, biomedical informatics, Computer Science, and Biostatistics. Data scientists must be highly knowledgeable in 1) computational techniques, technologies, and infrastructure for collecting, processing, and analyzing data on a massive scale, 2) statistical methodologies that accommodate large-scale, complex, high-dimensional biomedical data (e.g., model building and validation, false discovery rates, missing data imputation, recalibration for measurement error, and assessing the strength of statistical evidence) and 3) the scientific method and the specific biomedical and clinical contexts that led to data capture, downstream discovery and next-generation decision support systems (which governs the generalizability of results and quantitative tools). Because this field is evolving quickly, it is paramount to provide students with pragmatic training environments that emphasize and develop critical thinking skills and expose them to modern biomedical data analysis in real systems. For over a decade, the biomedical informatics doctoral program at Vanderbilt University has provided students with these experiences, leading to innovations in big data analytics with high impact in the underlying scientific applications in real clinical environments. Despite this, there is no formal program dedicated to big data science where students can study this area in the context of real biomedical collaborations (previous students have managed to do this via a patchwork of goodwill and determination, which is necessarily inefficient in coursework and laborious research collaborations). This proposal seeks to build on Vanderbilt's strength in this area to establish the Vanderbilt Training Program in Big Biomedical Data Science (BIDS) for the next generation of data scientists. This program will be managed as a track within the existing biomedical informatics doctoral program and led by the three PI's with complementary expertise in 1) computational infrastructure, 2) statistical methodologies, and 3) management...