Program Director/Principal Investigator (Last, First, Middle): Salleb-Aouissi, Ansaf Summary Prediction of Preterm Birth (PTB) has been an exceedingly challenging problem, predom- inantly due to the inherent complexity of its multifactorial etiology and the lack of approaches capable of integrating and interpreting large multidisciplinary data. It is a major long-lasting public health problem with heavy emotional and financial consequences to families and society [ , ]. PTB is the leading cause of mortality and long-term disabilities among neonates. Most studies to date have examined 7 20 individual risk factors through univariate analyses of their coincidence with PTB. Our previous work [NSF Eager 1454855, 1454814] developed predictive models for PTB based on non-genetic maternal attributes [ , 30 29 ]. A particularly challenging population to determine PTB risk is first time mothers (nulliparous women) due to the lack of prior pregnancy history. An important question is to know whether factors other than history of PTB can be used to identify a nullipara patient at risk. Specific aims of the original project Our basic specific aims are as follows: (1) Longitudinal Preterm Birth Prediction: We will first build a series of accurate prediction models for PTB using the nuMoM2b dataset. Such models will handle the challenges common to medical datasets including (a) imbalance in the classes, (b) missing data, and (c) disparity in data collection. We will achieve this by designing an objective function for Support Vector Machines that captures and corrects for these issues. Second, by leveraging the availability of patient future data, our Learning Under Privileged Infor- mation (LUPI)-based approach [ ] will significantly increase the rate of convergence of the algorithms 22 and improve prediction with less data. Our transformative approach is well-suited for medical datasets that are both limited by the number of patients and inherently include the challenges mentioned above. (2) Combining clinical and genetic features for risk prediction: In this aim we tackle questions of causality between the genetic information and its various forms of phenotypic implications by leveraging the phenotypically rich nuMoM2b dataset. We will first apply standard GWAS analysis to apply new insight regarding the changing patterns of genetic association as additional phenotypic data is accumu- lated as well as serve as a baseline. We will then seek to develop improved analysis of involvement of genetic contributions in PTB. (3) Clinical and social impact: We plan to assess the effectiveness of the methods in clinical practice by: (a) testing the effectiveness of the longitudinal models produced in objective 1 and 2 on existing clinical data at the New York Presbyterian Hospital. (b) building a sequential decision making model; this includes optimizing the scheduling of patient visits and diagnostic testing tailored for different classes of patients. Specific aims for this NIH...