Large language models (LLMs) have led to significant progress in natural language processing (NLP) and artificial intelligence more broadly, and have the potential to become a broad technology with applications across myriad domains. However, current LLMs rely on computationally expensive architectures and algorithms. This CAREER project aims to develop new methods for efficient, architecture-aware algorithms for language modeling that is expected to make existing LLM applications more efficient, enable new applications, and broaden access. To achieve these goals, this project will develop new methods that span the entire training and deployment pipeline, including: (1) architectural primitives that can overcome the computational inefficiencies transformers, (2) efficient training algorithms that will reduce the amount of resources required to train and finetune LMs, and (3) quantization algorithms along with flexible kernels that can better utilize the computational resources of modern hardware. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.