Stratified Random Picks

To make the stratified picks the genome is divided into the top 20%, middle 30%, and bottom 50% along two axis - gene density and nontranscribed conservation. Then three random picks are taken from each strata, and a fourth pick in the strata that are underrepresented in the manual picks. One additional backup pick is made in each strata in case there is an unforeseen technical problem with a region. The backup pick is last entry in each table section.

consNonTx 0% - 50%, gene 0% - 50% (1 manual)
June 2002 November 2002 April 2003 stats
chr13:28500001-29000000 chr13:24500016-25000015 chr13:29450016-29950015 consNonTx 2.8%, gene 0.5%
chr2:51700001-52200000 chr2:51837455-52337454 chr2:51616414-52116413 consNonTx 3.8%, gene 0.0%
chr4:119000001-119500000 chr4:118527386-119027385 chr4:118639860-119139859 consNonTx 3.9%, gene 0.0%
chr10:54300001-54800000 chr10:54489120-54989119 chr10:55376221-55876220 consNonTx 2.8%, gene 1.2%
chr5:15900001-16400000 chr5:16187472-16687471 chr5:15942554-16442553 consNonTx 5.1%, gene 1.7%
consNonTx 0% - 50%, gene 50% - 80% (4 manual)
June 2002 November 2002 April 2003 stats
chr2:115500001-116000000 chr2:116215329-116715328 chr2:118201388-118701387 consNonTx 6.2%, gene 2.3%
chr18:61100001-61600000 chr18:61234622-61734621 chr18:61046295-61546294 consNonTx 3.4%, gene 3.4%
chr12:40500001-41000000 chr12:40239443-40739442 chr12:40056957-40556956 consNonTx 1.7%, gene 3.1%
chr2:196700001-197200000 chr2:197214044-197714043 chr2:198465410-198965409 consNonTx 5.4%, gene 3.3%
consNonTx 0% - 50%, gene 80% - 100% (11 manual)
June 2002 November 2002 April 2003 stats
chr2:232500001-233000000 chr2:233173598-233673597 chr2:234508167-235008166 consNonTx 1.3%, gene 4.6%
chr13:111900001-112400000 chr13:107927238-108427237 chr13:112376702-112876701 consNonTx 1.1%, gene 5.5%
chr21:36900001-37400000 chr21:36983033-37483032 chr21:39242992-39742991 consNonTx 2.3%, gene 5.2%
chr4:47800001-48300000 chr4:48032776-48532775 chr4:47573442-48073441 consNonTx 1.9%, gene 4.4%
consNonTx 50% - 80%, gene 0% - 50% (2 manual)
June 2002 November 2002 April 2003 stats
chr16:25300001-25800000 chr16:25969826-26469825 chr16:25800363-26300362 consNonTx 9.7%, gene 0.5%
chr5:141800001-142300000 chr5:142482586-142982585 chr5:141883116-142383115 consNonTx 6.7%, gene 1.7%
chr18:25400001-25900000 chr18:25196197-25696196 chr18:25353226-25853225 consNonTx 7.4%, gene 0.9%
chr4:124800001-125300000 chr4:124166677-124666676 chr4:124280349-124780348 consNonTx 6.3%, gene 0.9%
consNonTx 50% - 80%, gene 50% - 80% (4 manual)
June 2002 November 2002 April 2003 stats
chr5:56000001-56500000 chr5:57392856-57892855 chr5:55805775-56305774 consNonTx 7.9%, gene 2.2%
chr6:131800001-132300000 chr6:132023965-132523964 chr6:132111977-132611976 consNonTx 6.9%, gene 2.1%
chr6:73700001-74200000 chr6:73699933-74199932 chr6:73683390-74183389 consNonTx 6.4%, gene 3.6%
chr4:53700001-54200000 chr4:53859184-54359183 chr4:53728692-54228691 consNonTx 9.0%, gene 2.1%
consNonTx 50% - 80%, gene 80% - 100% (3 manual)
June 2002 November 2002 April 2003 stats
chr1:149000001-149500000 chr1:146905332-147405331 chr1:147933156-148433155 consNonTx 10.2%, gene 8.4%
chr9:122800001-123300000 chr9:123331831-123831830 chr9:125138972-125638971 consNonTx 8.3%, gene 5.9%
chr15:39100001-39600000 chr15:36628619-37128618 chr15:41311935-41810934
(manually placed)
chr17:33400001-33900000 chr17:35665792-36165791 chr17:33478638-33978637 consNonTx 7.7%, gene 6.1%
consNonTx 80% - 100%, gene 0% - 50% (3 manual)
June 2002 November 2002 April 2003 stats
chr14:51200001-51700000 chr14:47673341-48173340 chr14:51867364-52367363 consNonTx 14.9%, gene 0.1%
chr11:133100001-133600000 chr11:132612235-133112234 chr11:131133068-131633067 consNonTx 13.5%, gene 0.3%
chr16:52600001-53100000 chr16:62362206-62862205 chr16:62010885-62510884 consNonTx 15.4%, gene 0.0%
chrX:41900001-42400000 chrX:42149253-42649252 chrX:42714870-43214869 consNonTx 13.4%, gene 0.7%
consNonTx 80% - 100%, gene 50% - 80% (1 manual)
June 2002 November 2002 April 2003 stats
chr8:117800001-118300000 chr8:118874200-119374199 chr8:118481838-118981837 consNonTx 11.4%, gene 3.2%
chr14:96900001-97400000 chr14:93204045-93704044 chr14:97378512-97878511 consNonTx 15.9%, gene 2.9%
chrX:117500001-118000000 chrX:119675382-120175381 chrX:120734591-121234590 consNonTx 10.7%, gene 2.0%
chr6:108100001-108600000 chr6:108287568-108787567 chr6:108264834-108764833 consNonTx 18.6%, gene 2.3%
consNonTx 80% - 100%, gene 80% - 100% (1 manual)
June 2002 November 2002 April 2003 stats
chr2:218300001-218800000 chr2:218998720-219498719 chr2:220241365-220741364 consNonTx 13.3%, gene 9.1%
chr11:66700001-67200000 chr11:65865884-66365883 chr11:64434365-64934364 consNonTx 13.4%, gene 9.0%
chr20:33600001-34100000 chr20:33559944-34059943 chr20:34509944-35009943 consNonTx 11.5%, gene 9.2%
chr6:41300001-41800000 chr6:41294331-41794330 chr6:41299332-41799331 consNonTx 15.2%, gene 4.8%
chr9:124300001-124800000 chr9:124831831-125331830 chr9:126638972-127138971 consNonTx 11.4%, gene 5.4%

Stratification of Manual Picks

Here is the noncoding conservation and gene density of non-overlapping 500 kb regions in the manual picks. The boundaries between strata are:

          low 50%  middle 30%  high 20%
------------------------------
gene 0.0-1.9% 1.9-4.2% 4.2-100%
consNotTx 0.0-6.3% 6.3-10.6% 10.6-100%
June 2002 November 2002 April 2003 Interest
chr7:114288355-116165780 chr7:114288155-116165580 chr7:115351222-117228647 CFTR 3  
chr5:130778557-131778556 chr5:131703638-132703637 chr5:131287278-132287277 Interleukin_Cluster 3  
chr11:118810001-119310000 chr11:117969240-118469239 chr11:116491019-116991018 Apo_Cluster 3  
chr22:28500001-30200000 chr22:28500001-30200000 chr22:30128508-31828507 Chr22 3  
chr21:30323762-32019746 chr21:30406794-32102778 chr21:32666762-34362746 Chr21 3  
chrX:147250001-148500000 chrX:149572309-150846234 chrX:150700001-151950000
(manually placed)
ChrX 3  
chr19:55200001-56200000 chr19:54724484-55728861 chr19:59007794-60008669
(manually placed)
Chr19 3  
chr16:79138-579137 chr16:10001-510000 chr16:1-500000
(manually placed)
Alpha_Globin 3  
chr11:5550000-6549999 chr11:5076527-6078118 chr11:4733457-5735048 Beta_Globin 3  
chr7:26600001-27100000 chr7:26599801-27099800 chr7:26665793-27165792 HOXA_cluster 3  
chr11:300001-900000 chr11:1941933-2547980 chr11:1702703-2308750
(manually placed)
IGF2/H19 4 PROBLEMATIC REGION - SUBSTANTIALLY REARRANGED BETWEEN JUNE AND NOVEMBER BUILDS]
chr7:112410791-113410790 chr7:112410791-113410790 chr7:113473834-114473833 FOXP2 3  

Semi-Manual Picks

Here's the stratification of the other zoo-seq regions. I recommend picking 7q21.13 and 7q31.33 to round things out.

June 2002 November 2002 April 2003 Chrom band
chr7:88319137-89433560 chr7:88318937-89433360 chr7:89381916-90496339 7q21.13 3
chr7:91589227-92559635 chr7:91589027-92559435 chr7:92652026-93622434 7q21.3 3
chr7:93650712-94868826 chr7:93650512-94868626 chr7:94713518-95931632 7q21.3 3
chr7:124556444-125719632 chr7:124556244-125719432 chr7:125619343-126782531 7q31.33 3
chr7:126427707-127330661 chr7:126427507-127330461 chr7:127490599-128393553 7q32.1 3

Methods

Gene density is defined as percentage of bases covered either by Ensembl genes, or human mRNA best blat alignments in the UCSC browser database.

Nontranscribed transcription was measured by a fairly elaborate process. 125 base non-overlapping subwindows were taken inside of the 500,000 base windows. Subwindows with less than 75% of their bases in a mouse alignment were thrown out. For the remaining subwindows the percentage with at least 80% base identity is used as the conservation score. To get the nontranscribed conservation score the mouse alignments in regions corresponding to Ensembl genes, all genbank mRNA blastz alignments, Fgenesh++ gene predictions, twinScan gene predictions, spliced EST alignments, and repeats were thrown out.


See also: Previous version of this WEB page data

Please report any problems on this page to Hiram Clawson at: hiram@soe.ucsc.edu
The Human Genome Project at UCSC

This page created: 2003-05-29