# Protein tagging at scale to enable functional genomic studies

> **NIH NIH R21** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2021 · $69,571

## Abstract

PROJECT SUMMARY/ABSTRACT:
One of the grand challenges within biomedical research is understanding the role of each of the
thousands of proteins in the human genome. This challenge is complicated by protein behavior
being highly context-dependent, necessitating its study within a variety of cellular and
environmental settings. A powerful approach for elucidating protein function is through the use of
protein tags. These tags facilitate the identification of interacting partners (via an affinity or epitope
tag), localization dynamics (via a fluorescent marker), and cellular function (via small-molecule-
regulated control of protein stability). Despite their utility, the sizable amount of time and effort
needed to develop endogenously tagged cell lines has limited our ability to capitalize on their
potential. The objective of this proposal is to develop a system for rapidly creating hundreds of
cell lines each with a unique protein tagged, which is essential to achieve our long-term goal of
enabling the massively parallel examination of protein function. Our central hypothesis is that the
cell’s intrinsic non-homologous end joining machinery, in combination with generic donor
templates and a robust selection strategy, will enable the creation of libraries of hundreds of cell
lines each containing a uniquely tagged protein. The rationale underlying this proposal is that, if
successful, we will transform the current, limited approach to protein tagging into a highly scalable
technology, opening a new frontier for the systematic interrogation of gene function. We provide
preliminary data, to demonstrate the feasibility of our approach and have outlined the following
aims for further maturing our technology: 1) characterize the rate of off-target tag insertion and
identify strategies to mitigate its occurrence; 2) demonstrate the plasticity of our approach by
creating hundreds of tagged cell lines within several cellular backgrounds, including induced
pluripotent stem cells. This proposal is innovative because it solves a long-standing bottleneck in
the generation of cell lines with precise genetic insertions, opening the door to the comprehensive
characterization of protein function. This work is significant as it represents a two order of
magnitude improvement in scalability over the state of the art, and has immediate applications in
the generation of designer cell lines. The expected outcome of this work is a high-throughput
method for generating libraries of uniquely modified cell lines (i.e. each with a different
endogenous protein tagged), at a rate of hundreds at once. This work will exert a positive impact
immediately by delivering a readily adaptable method for simultaneously tagging hundreds of
proteins within their native context, and in the long-term by achieving the essential first step
towards enabling the parallel interrogation of protein function en masse.

## Key facts

- **NIH application ID:** 10275833
- **Project number:** 1R21HG011855-01
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Alejandro Chavez
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $69,571
- **Award type:** 1
- **Project period:** 2021-09-01 → 2022-09-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10275833

## Citation

> US National Institutes of Health, RePORTER application 10275833, Protein tagging at scale to enable functional genomic studies (1R21HG011855-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10275833. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
