Speech technology, including artificial intelligence (AI) trained on speech data, performs poorly in cases where little or no recorded audio data exists to train the required AI models. Building better speech technology in these cases requires creating collections of speech materials and their transcriptions. However, transcription is immensely time-consuming without the assistance of existing AI technologies. This project builds a high-quality speech data set to enable phonetics and phonology research for several low-data languages, and to model an approach to ease the “transcription bottleneck” assisted by techniques in AI and natural language processing (NLP). The project jointly engages the expert perspectives of users of target languages, linguists, and computer scientists, and establishes an infrastructure for collaborative, computationally mediated language work. Other benefits to society include bridging laboratory-style research and real-world applications and providing innovative educational opportunities for trainees. This project builds a 60-hour corpus of naturalistic and read speech data recorded in the field, suitable for both AI/NLP applications and research in acoustic phonetics and phonology. Unsupervised or weakly supervised machine learning techniques are used to semi-automatically transcribe and annotate a portion of the speech corpus. This transcription and annotation process uses a novel human-in-the-loop approach making direct use of expert speaker