For background information, please see: NHGRI: ENCODE Target Selection Process Stratified Random Picks
To make the stratified picks, the human genome was divided into a top 20%,
middle 30%, and bottom 50% stratum along two axes: gene density and
non-exonic conservation. Three random picks were taken from each
stratum, and a fourth pick was made from the strata that were under-represented
in the manual picks. One additional backup pick was made in each stratum as a
contingency for unforeseen technical problems within the region. The backup pick
is the last (parenthesized) entry listed in each table section.
For an explanation of how gene density and non-exonic conservation were
determined, see the Methods section.
Stratification of Manual Picks
The following targets were manually selected as regions of biological
interest. See the Methods section for information on
strata boundaries. .
Semi-Manual Picks
These targets were manually selected from regions that have been extensively
studied and help balance the stratification.
Methods
Gene density is defined as the percentage of bases covered either by Ensembl
genes or human mRNA best Blat alignments in the UCSC Genome Browser database.
Non-exonic conservation was measured by a fairly elaborate process.
125 base non-overlapping sub-windows were taken inside the 500,000
base windows. Sub-windows with less than 75% of their bases in a mouse
alignment were discarded. Of the remaining sub-windows, those
with at least 80% base identity were used as the conservation score. To
calculate the non-exonic conservation score, the mouse alignments in
regions corresponding to the following were discarded: Ensembl genes, all
GenBank mRNA Blastz alignments, Fgenesh++ gene predictions, Twinscan gene
predictions, spliced EST alignments, and repeats.
The following table shows the non-exonic conservation and gene density
of non-overlapping 500 kb regions in the manual picks. The
boundaries between strata are:
low 50% middle 30% high 20%
------------------------------
Gene Density 0.0-1.9% 1.9-4.2% 4.2-100%
Non-Exonic Conservation 0.0-6.3% 6.3-10.6% 10.6-100%
See also:
Previous version of this WEB page data
Please report any problems on this page to Kate Rosenbloom at:
kate@soe.ucsc.edu
The Human Genome Project at UCSC
This page last modified: 2004-1-14
|