MLAGAN Cons Track Settings
 
MLAGAN Conservation   (All ENCODE Comparative Genomics tracks)

Display mode:       Reset to defaults

Type of graph:
Track height: pixels (range: 11 to 100)
Data view scaling: Always include zero: 
Vertical viewing range: min:  max:   (range: -25.74 to 4.48)
Transform function:Transform data points by: 
Windowing function: Smoothing window:  pixels
Negate values:
Draw y indicator lines:at y = 0.0:    at y =
Graph configuration help
List subtracks: only selected/visible    all  
hide
 Configure
 MLAGAN PhastCons  MLAGAN PhastCons Conservation   Data format 
hide
 Configure
 MLAGAN GERP Cons  MLAGAN GERP Conservation   Data format 
Source data version: ENCODE Oct 2005 Freeze
Assembly: Human May 2004 (NCBI35/hg17)

Description

This track displays different measurements of conservation based on the MLAGAN multiple sequence alignments of ENCODE regions shown in the MLAGAN Alignment track. Two programs — phastCons (phylogenetic hidden-Markov model method), and GERP (Genomic Evolutionary Rate Profiling) — generated the conservation scoring used to create this track. A related track, MLAGAN Elements, shows multi-species conserved sequences (MCSs) based on the conservation measurements displayed in this track.

For details on the conservation scores generated by each program, refer to the individual Methods subsections.

Display Conventions and Configuration

The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of the subtracks. A subtrack may be hidden from view by unchecking the box to the left of the track name in the list. For more information about the graphical configuration options, click the Graph configuration help link.

Color differences among the subtracks are arbitrary; they provide a visual cue for distinguishing the different gene prediction methods. See the Methods section for display information specific to each subtrack.

Methods

The methods used to create the MLAGAN alignments in the ENCODE regions are described in the MLAGAN Alignment track description.

PhastCons

The phastCons program predicts conserved elements and produces base-by-base conservation scores using a two-state phylogenetic hidden Markov model. The model consists of a state for conserved regions and a state for nonconserved regions, each of which is associated with a phylogenetic model. These two models are identical except that the branch lengths of the conserved phylogeny are multiplied by a scaling parameter rho (0 < rho < 1).

For determining the conservation for the ENCODE alignments, the nonconserved model was estimated from four-fold degenerate coding sites within the ENCODE regions using the program phyloFit. The parameter rho was then estimated by maximum likelihood, conditional on the nonconserved model, using the EM algorithm implemented in phastCons. Parameter estimation was based on a single large alignment, constructed by concatenating the alignments for all conserved regions.

PhastCons was run with the options --expected-lengths 15 and --target-coverage 0.01 to obtain the desired level of "smoothing" and a final coverage by conserved elements of 5%.

The conservation score at each base is the posterior probability that the base was generated by the conserved state of the phylo-HMM. It can be interpreted as the probability that the base is in a conserved element, given the assumptions of the model and the estimated parameters. Scores range from 0 to 1, with higher scores corresponding to higher levels of conservation.

More details on phastCons can be found in Siepel et. al. (2005) cited below.

GERP

The GERP score is the expected substitution rate minus the observed substitution rate at a particular human base. Scores are estimated on a column-by-column basis using multiple sequence alignments of mammalian genomic DNA. The scores are both positive and negative, with negative values (i.e. observed > expected) corresponding to neutral or unconstrained sites and positive values (i.e. observed < expected) corresponding to constrained or slowly evolving sites. The expected and observed rates are both calculated on a phylogenic tree using the same fixed topology. The branch lengths of the expected tree are based on the average substitutions at neutral sites. The branch lengths of the observed tree, which is calculated separately for each human base, are based on the substitutions seen at the column of the multiple alignment at that base. Species that have gaps at a particular column are not considered in the scoring for that column.

Higher scores correspond to human bases in alignment columns with higher degrees of similarity, i.e. bases that have evolved slowly, some of which have been under purifying selection. The opposite holds true for swiftly evolving (low similarity) columns.

Scores are deterministic, given a maximum-likelihood model of nucleotide substitution, species topology, neutral tree, and alignment.

Credits

PhastCons was developed by Adam Siepel, Cold Spring Harbor Laboratory, while at the Haussler Lab at UCSC.

GERP was developed primarily by Greg Cooper in the lab of Arend Sidow at Stanford University (Depts of Pathology and Genetics), in close collaboration with Eric Stone (Biostatistics, NC State), and George Asimenos and Eugene Davydov in the lab of Serafim Batzoglou (Dept. of Computer Science, Stanford).

The GERP data for this track were generated by Greg Cooper. The PhastCons data were generated by Elliott Margulies, with assistance from Adam Siepel.

References

Cooper, G.M., Stone, E.A., Asimenos, G., NISC Comparative Sequencing Program, Green, E.D., Batzoglou, S. and Sidow, A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15(7), 901-13 (2005).

Margulies, E.H., Blanchette, M., NISC Comparative Sequencing Program, Haussler, D. and Green, E.D. Identification and characterization of multi-species conserved sequences. Genome Res 13(12), 2507-18 (2003).

Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8), 1034-50 (2005).

References for the MLAGAN alignment tools can be found on the MLAGAN Alignment track description page.