Rationale for the Mouse ENCODE project
Knowledge of the function of genomic DNA sequences comes from three
basic approaches. Genetics uses changes in behavior or structure of a cell or organism
in response to changes in DNA sequence to infer function of the altered sequence.
Biochemical approaches monitor states of histone modification, binding of specific
transcription factors, accessibility to DNases and other epigenetic features along
genomic DNA. In general, these are associated with gene activity, but the precise
relationships remain to be established. The third approach is evolutionary, using
comparisons among homologous DNA sequences to find segments that are evolving
more slowly or more rapidly than expected given the local rate of neutral change. These
are inferred to be under negative or positive selection, respectively, and interpreted
as DNA sequences needed for a preserved (negative selection) or adaptive
(positive selection) function.
The ENCODE project aims to discover all the DNA sequences associated with
various epigenetic features, with the reasonable expectation that these will also be
functional (best tested by genetic methods). However, it is not clear how to relate these
results with those from evolutionary analyses. The mouse ENCODE project aims to
make this connection explicitly and with a moderate breadth. Assays identical to those
being used in the ENCODE project are performed in cell types in mouse that are similar
or homologous to those studied in the human project. The comparison will be used to discover
which epigenetic features are conserved between mouse and human, and
examine the extent to which these overlap with the DNA sequences under negative
selection. The contribution of functional DNA preserved in mammals versus
function in only one species will be discovered. The results will have a
significant impact on the understanding of the evolution of gene regulation.
Maps of histone modifications
Levels of five histone modifications were determined.
Methylation of lysine 4 of histone H3 is associated with active chromatin, with monomethylation (H3K4me1)
associated with enhancers and trimethylation (H3K4me3) associated with active promoters.
Trimethylation of lysine 36 (H3K36me3) is associated with elongating RNA polymerase II.
Two marks associated with repressed chromatin were also determined, trimethylation of lysine 27 of histone H3 (H3K27me3) which is
deposited by Polycomb repressive complex 2, and trimethylation of lysine 9 of histone H3 (H3K9me3).
Maps of genomic DNA in chromatin with these histone modifications are generated by ChIP-seq.
This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich
genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies)
followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing was done on the Illumina GAIIx and HiSeq.
The sequence tags are mapped back to the mouse genome ((Langmead et al., 2009)),
and a graph of the enrichment for histone modifications are displayed as the "Signal" track (essentially the counts of mapped reads per interval)
and the deduced probable binding sites from the MACS program ((Zhang et al., 2008)) are shown in the "Peaks" track.
Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type.
The sequence reads, quality scores, and alignment coordinates from these experiments are available for download.
Display Conventions and Configuration
This track is a multi-view composite track that contains multiple data types
(views). For each view, there are multiple subtracks that
display individually on the browser. Instructions for configuring multi-view
tracks are here.
This track contains the following views:
- Regions of signal enrichment based on processed data
(usually normalized data from pooled replicates). Intensity is represented in
grayscale; the darker shading shows higher intensity (a solid vertical line
in the peak region represents the point with the highest signal).
ENCODE Peaks tables contain fields for statistical significance, including FDR
- Density graph (wiggle) of signal enrichment based on
Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.
Cells were grown according to the approved
ENCODE cell culture protocols.
The chromatin immunoprecipitation followed published methods (Welch et al., 2004).
Information on antibodies used is available via the hyperlinks in the "Select
subtracks" menu. Samples passing initial quality thresholds (enrichment and depletion
for positive and negative controls - if available - by quantitative PCR of ChIP material)
are processed for library construction for Illumina sequencing, using the ChIP-seq
Sample Preparation Kit purchased from Illumina. Starting with a 10 ng sample of ChIP
DNA, DNA fragments were repaired to generate blunt ends and a single A nucleotide
was added to each end. Double-stranded Illumina adaptors were ligated to the
fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between
250-350 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA
HS Assay Kit. The DNA library was sequenced on the Illumina Genome Analyzer II
sequencing system, and more recently on the HiSeq. Cluster generation, linearization,
blocking and sequencing primer reagents were provided in the Illumina Cluster
Amplification kits. All samples were determined as biological replicates except time
course samples. The data displayed are from the pooled reads for all replicates, but
individual replicates are available by download.
The resulting 36-nucleotide sequence reads (fastq files) were moved to a data
library in Galaxy, and the tools implemented in Galaxy were used for further processing
via workflows (Blankenberg et al., 2010). The reads were mapped to the mouse genome
(mm9 assembly) using the program bowtie (Langmead et al., 2009), and the files of
mapped reads for the ChIP sample and from the "input" control (no antibody) were
processed by MACs (Zhang et al., 2008) to call peaks for occupancy by transcription
factors, using the parameters mfold=15, bandwidth=125. Since, the signal for some histone
modifications is not expected to be tightly localized (compared to a transcription factor),
peak calling programs may not be appropriate. Thus in addition, we provide wiggle
tracks with tag counts for every 10 bp segment. Per-replicate
alignments and sequences are available for download at
This is Release 2 (August 2012). It contains a total of 30 ChIP-seq experiments on Histone Modifications with the addition of 1 new
Previous versions of files are available for download from the
Cell growth, ChIP, and Illumina library construction were done primarily by Weisheng Wu,
and sequencing on the Illumina platform was done largely by Cheryl Keller in the laboratory of
Ross Hardison (PSU). Data processing and analysis were overseen by James Taylor (Emory University),
using tools provided in the Galaxy platform (Anton Nekrutenko, PSU, and James Taylor, Emory) enabled
by the Penn State Cyberstar computer (supported by the National Science Foundation).
Generation of these data was supported by National Institutes of Health grants R01DK065806 and RC2HG005573.
Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K et al.
A bivalent chromatin structure marks key developmental genes in embryonic stem cells.
Cell. 2006 Apr 21;125(2):315-26.
Blankenberg D, Gordon A, Von Kuster G, Coraor N, Taylor J, Nekrutenko A, Galaxy Team.
Manipulation of FASTQ data with Galaxy.
Bioinformatics. 2010 Jul 15;26(14):1783-5.
ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET et al.
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.
Nature. 2007 Jun 14;447(7146):799-816.
Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, Barrera LO, Van Calcar S, Qu C, Ching KA et al.
Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome.
Nat Genet. 2007 Mar;39(3):311-8.
Langmead B, Trapnell C, Pop M, Salzberg SL.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
Genome Biol. 2009;10(3):R25.
Müller J, Hart CM, Francis NJ, Vargas ML, Sengupta A, Wild B, Miller EL, O'Connor MB, Kingston RE, Simon JA.
Histone methyltransferase activity of a Drosophila Polycomb group repressor complex.
Cell. 2002 Oct 18;111(2):197-208.
Rando OJ, Chang HY.
Genome-wide views of chromatin structure.
Annu Rev Biochem. 2009;78:245-71.
Weiss MJ, Yu C, Orkin SH.
Erythroid-cell-specific properties of transcription factor GATA-1 revealed by phenotypic rescue of a gene-targeted cell line.
Mol Cell Biol. 1997 Mar;17(3):1642-51.
Welch JJ, Watts JA, Vakoc CR, Yao Y, Wang H, Hardison RC, Blobel GA, Chodosh LA, Weiss MJ.
Global regulation of erythroid gene expression by transcription factor GATA-1.
Blood. 2004 Nov 15;104(10):3136-47.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W et al.
Model-based analysis of ChIP-Seq (MACS).
Genome Biol. 2008;9(9):R137.
Wu W, Cheng Y, Keller CA, Ernst J, Kumar SA, Mishra T, Morrissey C, Dorman CM, Chen KB, Drautz D et al.
Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration.
Genome Res. 2011 Oct;21(10):1659-71.
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior consent,
submit publications that use an unpublished ENCODE dataset until nine months
following the release of the dataset. This date is listed in the Restricted Until column,
above. The full data release policy for ENCODE is available