DLESS Track Settings
 
Detection of LinEage Specific Selection (DLESS)   (All ENCODE Comparative Genomics tracks)

Display mode:      Duplicate track
Data schema/format description and download
Source data version: ENCODE Oct 2005 Freeze
Assembly: Human May 2004 (NCBI35/hg17)
Data last updated at UCSC: 2006-05-02

Description

This track shows elements that are predicted to be under lineage-specific selection, according to the DLESS (Detection of LinEage Specific Selection) program. Three types of elements are identified: elements "conserved" across all species, elements "gained" (i.e., that have come under selection) on some branch of the phylogeny, and elements "lost" (i.e., released from selection) on some branch of the phylogeny. Currently, DLESS allows for negative selection only and permits at most one gain or loss event per element. Thus, sequences that are conserved (relative to a model of neutral evolution) in some subtree of the phylogeny, but are not especially conserved in the complementary "supertree," are predicted as "gains," and sequences that are conserved in some supertree but not in the complementary subtree are predicted as "losses." Sequences that are conserved across the whole tree are simply labeled "conserved."

Display Conventions

Predicted conserved elements are shown in black, gains are shown in green, and losses are shown in red. Gains and losses are labeled with two species names indicating the branch of the phylogeny on which the event in question is predicted to have occurred. For example, a gain labeled "rat-mouse" is predicted to have occurred on the branch above the most recent common ancestor of rat and mouse (i.e., it is a rodent-specific conserved element). By clicking on an element in "pack" or "full" mode you can obtain a details page summarizing the support for the prediction. This page includes various statistics and p-values computed by the phyloP program.

Methods

DLESS was run on the NHGRI/PSU TBA alignments of the sequences from the September 2005 ENCODE data freeze. Only the 17 mammals that are well represented across ENCODE targets were included. The non-mammalian vertebrates were excluded because they can only be aligned in conserved regions. The program was given a phylogeny and model of neutral substitution estimated from fourfold degenerate sites in coding regions, using the phyloFit program. (The tree topology was held fixed; we assumed the same topology as for the other ENCODE analyses.) A rendering of the estimated neutral phylogeny, showing the 17 species at the leaves and estimated branch lengths in expected substitutions per site, is available here.

The parameters that define DLESS's HMM and indel model were estimated by maximum likelihood. The following values were used: --expected-length 20 --target-coverage 0.055 --phi 0.261 --indel-model 0.0334,0.0533,0.0529,0.0117,0.0206,0.0654

After predicted elements were obtained using DLESS, p-values for each element were computed using phyloP. Only predictions with p-values of less than 0.05 were included in the track (conditional p-values in the case of lineage-specific elements; see Siepel et al., 2006).

DLESS is based on a phylo-HMM with states for neutrally evolving and conserved regions, and for gains and losses on each branch of the tree. It uses insertions and deletions as well as substitutions in identifying elements under selection. PhyloP computes p-values based on prior and posterior distributions of the number of substitutions, as implied by a model of neutral substitution. These distributions are obtained using a dynamic programming algorithm. Details are given in Siepel et al. (2006).

Credits

DLESS and phyloP were written by Adam Siepel, based on ideas worked out in collaboration with Katie Pollard and David Haussler. Thanks to Elliott Margulies for providing the model of neutral substitution, and to Brian Raney for preprocessing the alignments to distinguish between indels and missing data.

References

Siepel, A., Pollard, K.S. and Haussler, D. New methods for detecting lineage-specific selection. In Proc. 10th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '06) (2006).