Summary Conventional ɑβ T cell receptor (TCR) recognition of a cognate peptide-Major Histocompatibility Complex (pMHC) is central to adaptive immune recognition of pathogens and pathologically associated self-proteins. Despite substantial progress in structural prediction of protein-protein interactions with tools such as AlphaFold and RoseTTAFold, de novo prediction of TCR specificity (target pMHC) from TCR sequence has not yet been realized. Indeed, within the largest databases of curated TCR specificities, only ~105 unique TCR:pMHC assignments have been curated, and these are focused on <100 unique pMHC epitopes. In our previous work, we established that >200 unique receptors recognizing the same epitope are required to confidently predict whether a previously unobserved receptor belongs to the same specificity group. This work demonstrates that once data are sufficiently dense, local prediction of specificity becomes feasible. It follows that the current sparse nature of the available data is the major restriction to advancing the field. Thus, our central hypothesis is that advancing predictive models for TCR specificity requires a dramatic increase in the magnitude and diversity of curated TCR-pMHC data, which in turn requires new approaches for generating such useful data sets. In three Aims, we will address major limitations of the current epitope discovery and TCR characterization pipelines. In Aim 1, we will improve methods for relating single chain TCR sequences to specific peptides for generating large libraries of well-curated TCRα or TCRβ associations with individual epitopes. In addition to supporting our central goal, these data will have significant independent utility for immune profiling and diagnostics. In Aim 2, we will establish methods for assigning paired chain TCRɑβ data from single cell experiments to epitope pools, extending our recently reported reverse epitope discovery pipeline. Aim 3 will integrate public data and the data generated in Aims 1 and 2, with novel structural and computational approaches to generate improved de novo specificity prediction algorithms. These Aims will be accomplished by accessing our collection of longitudinally sampled PBMCs from >4000 humans across well-curated cohorts from diverse ancestries and infection histories.