PROJECT SUMMARY Overview: We will extend and develop implementations of foundational methods for analyzing populations of attributed connectomes. Our toolbox will enable brain scientists to (1) infer latent structure from individual connectomes, (2) identify meaningful clusters among populations of connectomes, and (3) detect relationships between connectomes and multivariate phenotypes. The methods we develop and extend will naturally overcome the challenges inherent in connectomics: high-dimensional non-Euclidean data with multi-level nonlinear interactions. Our implementations will comply with the highest open-source standards by: providing extensive online documentation and extended tutorials, hosting workshops to demonstrate our tools on an annual basis, and merging our implementations into commonly used packages such as scikit-learn [1], scipy [2], and networkx [3]. All of the code we develop is open source. We strive to ensure that our code is shared in accordance with the strictest guiding principles. We chose to implement these algorithms in Python due to its wide adoption in the neuroscience and data science fields. In particular, many other neuroscience tools applicable to connectomics, including NetworkX DiPy, mindboggle, nilearn, and nipy, are also implemented in Python. This will enable researchers to chain our analysis tools onto pre-existing pipelines for data preprocessing and visualization. Nonetheless, we feel that sharing our code in our own public repositories is insufficient for global reach. We have also begun reaching out to developers of the leading data science packages in python, including scipy, sklearn, networkx, scikit-image, and DiPy. For each of those packages, we have informal approval to begin integrating algorithms that we have developed. Those packages are collectively used by >220,000 other packages, so merging our algorithms into those packages will significantly extend our global reach. All researchers investigating connectomics, including all the authors of the 24,000 papers that mention the word “connectome”, will be able to apply state-of-the-art statistical theory and methods to their data. Currently, we have about 150 open source software projects on our NeuroData GitHub organization. Collectively, these projects get about 2,000 downloads and >11,000 views per month. As we incorporate additional functionality as described in this proposal, we expect far more researchers across disciplines and sectors will utilize our software. 20