Project Summary / Abstract Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Thorough and accurate annotation of repetitive content in genomes depends on a comprehensive database of known TEs, along with robust statistical and procedural methods for recognizing decayed instances of elements and disentangling their complex relationships. Annotation of TE instances is usually performed using our RepeatMasker software, which compares a genome to a database containing representations of known repeat families. These have historically been consensus sequences, which generally approximate the sequences of the original TEs. Our Dfam database is an open access collection of repetitive DNA families, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). We have demonstrated that profile HMMs support improved annotation sensitivity, and Dfam provides numerous aids to both curators of TE families and those who make use of the resulting annotations. During the life of this grant, the database has grown to include families belonging to more than 1000 species (from a baseline of 5). This growth has introduced a number of scale-based pressures, which in some cases have forced us to reduce Dfam functionality in response, and in other cases highlighted ways that the resource can better meet the needs of the community. Our proposed efforts largely target these matters while continuing to expand and diversify the resource.