Consens Elements Track Settings
 
MSA Consensus Constrained Elements   (All ENCODE Comparative Genomics tracks)

Display mode:       Reset to defaults
List subtracks: only selected/visible    all  
hide
 Loose  MSA Consensus Loose Constrained Elements   Data format 
hide
 Moderate  MSA Consensus Moderate Constrained Elements   Data format 
hide
 Strict  MSA Consensus Strict Constrained Elements   Data format 
Source data version: ENCODE Oct 2005 Freeze
Assembly: Human May 2004 (NCBI35/hg17)

Description

The consensus elements in this track were generated by the ENCODE Multi-Species Analysis group from the nine different combinations of three conservation algorithms (phastCons, binCons, and GERP) and three sequence alignment methods (TBA, MLAGAN and MAVID) applied to the ENCODE region sequences of 28 vertebrate species as defined in the September 2005 ENCODE MSA sequence freeze and the MSA species guide tree.

Three different stringencies were used. The loose set of constrained sequences represent bases identified as being constrained by any conservation algorithm on any alignment. The moderate set of constrained sequences is derived from bases shown to be constrained by at least two of the three conservation algorithms on at least two of the three alignments. Finally, the strict set of constrained sequences represent only those bases that were constrained using all three conservation programs on all three multi-sequence alignments.

Display Conventions and Configuration

The locations of constrained elements are indicated by blocks in the graphical display. To show only selected subtracks within this annotation, uncheck the boxes next to the tracks you wish to hide.

Methods

See the description pages for the TBA Elements, MLAGAN Elements and MAVID Elements for additional information about methods used to generate these data.

Verification

See the description pages for the TBA Elements, MLAGAN Elements and MAVID Elements for information about verification techniques used to generate these data.

Credits

The strict, moderate, and loose data shown in these subtracks were contributed by Elliott Margulies of NHGRI representing the ENCODE Multi-Species Analysis group.

Conservation Scoring

PhastCons was developed by Adam Siepel, Cold Spring Harbor Laboratory, while at the Haussler Lab at UCSC.

BinCons was developed by Elliott Margulies, while at the Eric Green lab.

GERP was developed primarily by Greg Cooper in the lab of Arend Sidow at Stanford University (Depts of Pathology and Genetics), in close collaboration with Eric Stone (Biostatistics, NC State), and George Asimenos and Eugene Davydov in the lab of Serafim Batzoglou (Dept. of Computer Science, Stanford).

Sequence Alignment

TBA and Blastz were developed by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group.

MLAGAN, Shuffle-LAGAN, and SuperMap were written by Mike Brudno while at the Batzoglou lab. MUSCLE was authored by Bob Edgar. AB-BLAST was provided by the Gish lab at the School of Medicine, University of Washington in St. Louis.

Mercator was written by Colin Dewey and Lior Pachter at the Pachter Lab Comparative Genomics Group at UC Berkeley. MAVID was authored by Nicholas Bray and Lior Pachter.

The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community.

References

Blanchette, M., Kent, W.J., Reimer, C., Elnitski, L., Smit, A., Roskin, K., Baertsch, R., Rosenbloom, K.R., Clawson, H. et al. Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner. Genome Res 14(4), 708-15 (2004).

Bray, N. and Pachter, L. MAVID: Constrained Ancestral Alignment of Multiple Sequences. Genome Res 14(4), 693-99 (2004).

Brudno, M., Do, C., Cooper, G., Kim, M.F., Davydov, E., NISC Comparative Sequencing Program, Green, E.D., Sidow, A. and Batzoglou, S. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13(4), 721-31 (2003).

Brudno, M., Malde, S., Poliakov, A., Do, C., Couronne, O., Dubchak, I. and Batzoglou, S. Global alignment: finding rearrangements during alignment. Bioinformatics 19(Suppl. 1), i54-i62 (2003).

Burge, C. and Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1), 78-94 (1997).

Chiaromonte, F., Yap, V.B., and Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput, 115-26 (2002).

Cooper, G.M., Stone, E.A., Asimenos, G., NISC Comparative Sequencing Program, Green, E.D., Batzoglou, S. and Sidow, A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15(7), 901-13 (2005).

Dewey, C.N. and Pachter, L. Mercator: multiple whole-genome orthology map construction. In preparation.

Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32(5), 1792-97 (2004).

Kent, W.J. BLAT-the BLAST-like alignment tool. Genome Res 12(4), 656-664 (2002).

Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S.L. Versatile and open software for comparing large genomes. Genome Biol 5(2), R12 (2004).

Margulies, E.H., Blanchette, M., NISC Comparative Sequencing Program, Haussler, D. and Green, E.D. Identification and characterization of multi-species conserved sequences. Genome Res 13(12), 2507-18 (2003).

Murphy, W.J., et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294(5550), 2348-51 (2001).

Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D. and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res 13(1), 103-7 (2003).

Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8), 1034-50 (2005).