PROJECT SUMMARY Management, treatment, and diagnostic approaches for non-small cell lung cancer (NSCLC) have evolved in the last decade from primarily empirical methodologies to objective strategies that rely on clinical characteristics of the patient and morphological features of the nodule1. Recent recommendations by the United States Preventive Service Task Force (USPSTF) recommends that high-risk individuals be screened yearly with low-dose computed tomography (LDCT), as this screening practice provides high sensitivity with acceptable specificity for lung cancer2. However, the introduction of LDCT as the primary screening modality for lung cancer has increased the identification of indeterminate nodules. The increased detection rates caused by this screening practice decreases the overall quality of life for at-risk individuals through repeated follow-up and the frequent need for invasive procedures for what is likely a benign nodule. In this training grant, we aim to improve upon these outcomes by improving the performance of deep neural networks (DNNs) in data-scarce domains, specifically lung cancer. The overall hypothesis of this proposal is that DNN classification accuracy of indeterminate lung nodules will be significantly improved through the use of pre-specified malignant nodule and parenchymal morphological features that would not be readily extractable by a DNN directly from the LDCT scans. We will address this hypothesis and achieve the goals of this proposal by augmenting the National Lung Screening Trial (NLST) dataset to infer important morphological parenchymal features for malignant nodule classification and by using ancillary data from the COPDgene dataset. The experiments proposed in Aim 1 will explore the impact of using augmented morphological parenchymal features on the classification performance of our deep neural networks. Aim 2 will explore the relative contribution of a contextually similar dataset, COPDgene, for classification and parameter tuning. The proposed work will yield improved approaches for classification of indeterminate pulmonary nodules as either malignant or benign via an innovative approach for training DNNs using domain knowledge and contextually related datasets in data-scarce domains. Ultimately, the application of these approaches will improve our understanding of those parenchymal morphological features that are most critical for discriminating pulmonary nodules. In addition, the training grant I will receive in the course of these studies related to generating CT markers, detecting early lung cancer pathogenesis, and computational modeling will serve as a solid foundation for my future career as an independent biomedical investigator.