While recent advancements in large-scale machine learning models have shown impressive capabilities, they often rely on hundreds of millions of labeled samples. However, obtaining high-quality labels in many fields is extremely costly, so most available data remain unlabeled. For example, although millions of images and videos can be easily collected from social media platforms, manually labeling them is a tedious and time-consuming process. To address the challenge of limited labeled data, self-supervised representation learning has emerged as a promising approach in computer vision and natural language processing. It has already played a key role in the success of recent large language models. Despite its strong performance in practice, the theoretical understanding of self-supervised representation learning remains limited. Moreover, the problem of scarce labeled data also affects biomedical research, but the existing self-supervised methods cannot be directly applied due to the unique nature of biomedical datasets. This project aims to address these gaps by developing new theoretical frameworks for self-supervised representation learning, along with computational tools tailored to biomedical studies. It also includes educational efforts to engage students and the broader public with this growing area of research. This project aims to advance the theoretical foundations of self-supervised representation learning and transform how unlabeled data are utilized in biomedica