While thousands of genomic loci have been associated with common human disease, translating these disease associations to clinical impact is often limited by lack of appropriate human cell lines and our inability to unambiguously identify orthologous regulatory regions for studies in model organisms. Sequence alignment, the most frequently used method to detect evolutionary conservation of enhancers and promoters, accurately identifies genes and promoters, but often fails to detect many functionally conserved distal enhancers, in spite of the fact that orthologous TFs generally have conserved binding activity. We have developed a machine learning approach to identify the set of transcription factor binding sites (TFBS) active in cell-type specific enhancers from epigenomic data using a gapped-kmer features that encompasses all possible TFBS (gkm-SVM), and this classifier can distinguish active enhancers from non-active regions. This DNA sequence based model can accurately predict enhancer activity in massively parallel reporter assays and the impact of variation in regulatory elements associated with human disease (deltaSVM). The central aim of this proposal is to develop a new computational method using the syntenic gapped-kmer composition to detect functionally conserved regulatory elements missed by conventional sequence alignment methods. Previous studies of enhancer evolution across mammals using sequence alignment have reported that promoters are more conserved than enhancers, and that enhancers are evolving rapidly. In contrast, using gapped k-mers, we find that cell-type specific enhancers and promoters in matched ENCODE/Roadmap tissues are equally functionally conserved, and that gapped k-mers can identify conserved enhancers that are undetectable by sequence alignment. We hypothesize that the improvements relative to sequence alignment methods arise because the gapped-kmer feature space is able to detect similarity between rearrangements and variations of TF binding sites which may vary at gapped positions but which retain similar binding affinities. We will develop a method to detect conserved regulatory regions using the gkm-SVM kernel as a metric of sequence conservation and optimize this method by comparing to genome-wide functional data. We will then develop algorithms to detect long range syntenic intervals of similar gapped k-mer composition and generate genome- wide maps of evolutionary conservation. We will validate the predictions with CRISPRi in human and mouse stem cells differentiated to endoderm.