Project Summary/Abstract Diagnosis of lumbar radiculopathy (LR) currently relies on a qualitative interpretation of magnetic resonance imaging (MRI) studies and lacks standardization. This has led to inconsistent treatment and rising costs, while quality of life metrics have remained stagnant. To standardize the diagnosis of LR, the subjective and qualitative radiologic assessment needs to be augmented with accurate measurements of neuroforamina (NF) and central canal (CC) areas, two anatomical structures that are critical to the etiology of LR. However, precise measurements will require manual delineations of these regions on MRI. This is a tedious and time-consuming process that is not feasible on a daily, large-scale basis in the clinic. Deep Learning (DL) is a relatively new machine learning technique, which holds the promise of automating NF and CC segmentation. None the less, there remain several challenges to making DL-based segmentation routine in clinical practice. First, training and validating a DL model for segmentation of a given anatomical structure requires a large amount of expert annotated training data. Expert annotated data is expensive and time consuming to obtain, thus thwarting the development of quantitative imaging diagnostics for LR. To address this, we propose an expert-led manual delineation of NF and CC using de-identified MRI data extracted from UCLA's picture archiving and communications system (PACS). We expect the resulting database to contain data from over 35,000 lumbar MRI scans, with associated clinical history, demographics, and patient outcomes data. In a subset (1000) of these data, NFs and CCs will be annotated by multiple human expert raters. The consensus of these delineations will be used as ground truth segmentations to train, validate and improve our understanding of DL models. Secondly, as a part of this proposal, we aim to address several technical challenges that limit the deployment of automated image segmentation techniques to the clinic. Chief amongst these challenges is the failure of automated methodologies in the face of variation due to factors such as pathology, scanner protocol alterations, and general demographic variation. Additionally, our current understanding of DL does not allow us to categorically state the total number of expert annotated data that will be needed to train a model with a specified level of accuracy. Finally, we do not currently understand how selection of training cases for expert delineation affects generalization accuracy. To address the aforementioned challenges, we propose experiments to define the relationship between DL algorithms and the cardinality of training data. We will also explore the use of unsupervised machine learning strategies, namely clustering and reinforcement learning, to understand how training data selection influences algorithmic accuracy. In summary, we propose to address data availability and technical knowledge gaps to the development of ...