Stratified Random Picks

To make the stratified picks the genome is divided into the top 20%, middle 30%, and bottom 50% along two axis - gene density and nontranscribed conservation. Then three random picks are taken from each strata, and a fourth pick in the strata that are underrepresented in the manual picks. One additional backup pick is made in each strata in case there is an unforeseen technical problem with a region. The backup pick is parenthesized below.

The left coordinate is the June genomic position for the feature while
the right coordinate is the November genomic position for the feature.

consNonTx 0% - 50%, gene 0% - 50% (1 manual)

June: chr13:28500001-29000000    Nov: chr13:24500016-25000015 consNonTx 2.8%, gene 0.5%
June: chr2:51700001-52200000    Nov: chr2:51837455-52337454 consNonTx 3.8%, gene 0.0%
June: chr4:119000001-119500000   Nov: chr4:118527386-119027385 consNonTx 3.9%, gene 0.0%
June: chr10:54300001-54800000   Nov: chr10:54489120-54989119 consNonTx 2.8%, gene 1.2%
(June: chr5:15900001-16400000    Nov: chr5:16187472-16687471 consNonTx 5.1%, gene 1.7%)

consNonTx 0% - 50%, gene 50% - 80% (4 manual)

June: chr2:115500001-116000000     Nov: chr2:116215329-116715328 consNonTx 6.2%, gene 2.3%
June: chr18:61100001-61600000    Nov: chr18:61234622-61734621 consNonTx 3.4%, gene 3.4%
June: chr12:40500001-41000000    Nov: chr12:40239443-40739442 consNonTx 1.7%, gene 3.1%
(June: chr2:196700001-197200000    Nov: chr2:197214044-197714043 consNonTx 5.4%, gene 3.3%)

consNonTx 0% - 50%, gene 80% - 100% (11 manual)

June: chr2:232500001-233000000    Nov: chr2:233173598-233673597 consNonTx 1.3%, gene 4.6%
June: chr13:111900001-112400000    Nov: chr13:107927238-108427237 consNonTx 1.1%, gene 5.5%
June: chr21:36900001-37400000    Nov: chr21:36983033-37483032 consNonTx 2.3%, gene 5.2%
(June: chr4:47800001-48300000    Nov: chr4:48032776-48532775 consNonTx 1.9%, gene 4.4%)

consNonTx 50% - 80%, gene 0% - 50% (2 manual)

June: chr16:25300001-25800000    Nov: chr16:25969826-26469825 consNonTx 9.7%, gene 0.5%
June: chr5:141800001-142300000    Nov: chr5:142482586-142982585 consNonTx 6.7%, gene 1.7%
June: chr18:25400001-25900000    Nov: chr18:25196197-25696196 consNonTx 7.4%, gene 0.9%
(June: chr4:124800001-125300000    Nov: chr4:124166677-124666676 consNonTx 6.3%, gene 0.9%)

consNonTx 50% - 80%, gene 50% - 80% (4 manual)

June: chr5:56000001-56500000    Nov: chr5:57392856-57892855 consNonTx 7.9%, gene 2.2%
June: chr6:131800001-132300000    Nov: chr6:132023965-132523964 consNonTx 6.9%, gene 2.1%
June: chr6:73700001-74200000    Nov: chr6:73699933-74199932 consNonTx 6.4%, gene 3.6%
(June: chr4:53700001-54200000    Nov: chr4:53859184-54359183 consNonTx 9.0%, gene 2.1%)

consNonTx 50% - 80%, gene 80% - 100% (3 manual)

June: chr1:149000001-149500000    Nov: chr1:146905332-147405331 consNonTx 10.2%, gene 8.4%
June: chr9:122800001-123300000    Nov: chr9:123331831-123831830 consNonTx 8.3%, gene 5.9%
June: chr15:39100001-39600000    Nov: chr15:36628619-37128618 consNonTx 9.7%, gene 10.6%
(June: chr17:33400001-33900000    Nov: chr17:35665792-36165791 consNonTx 7.7%, gene 6.1%)

consNonTx 80% - 100%, gene 0% - 50% (3 manual)

June: chr14:51200001-51700000    Nov: chr14:47673341-48173340 consNonTx 14.9%, gene 0.1%
June: chr11:133100001-133600000    Nov: chr11:132612235-133112234 consNonTx 13.5%, gene 0.3%
June: chr16:52600001-53100000   Nov: chr16:62362206-62862205 consNonTx 15.4%, gene 0.0%
(June: chrX:41900001-42400000    Nov: chrX:42149253-42649252 consNonTx 13.4%, gene 0.7%)

consNonTx 80% - 100%, gene 50% - 80% (1 manual)

June: chr8:117800001-118300000     Nov: chr8:118874200-119374199 consNonTx 11.4%, gene 3.2%
June: chr14:96900001-97400000   Nov: chr14:93204045-93704044 consNonTx 15.9%, gene 2.9%
June: chrX:117500001-118000000    Nov: chrX:119675382-120175381 consNonTx 10.7%, gene 2.0%
June: chr6:108100001-108600000    Nov:  chr6:108287568-108787567 consNonTx 18.6%, gene 2.3%

consNonTx 80% - 100%, gene 80% - 100% (1 manual)

June: chr2:218300001-218800000    Nov: chr2:218998720-219498719 consNonTx 13.3%, gene 9.1%
June: chr11:66700001-67200000    Nov: chr11:65865884-66365883 consNonTx 13.4%, gene 9.0%
June: chr20:33600001-34100000    Nov: chr20:33559944-34059943 consNonTx 11.5%, gene 9.2%
June: chr6:41300001-41800000    Nov: chr6:41294331-41794330 consNonTx 15.2%, gene 4.8%
(June: chr9:124300001-124800000    Nov: chr9:124831831-125331830 consNonTx 11.4%, gene 5.4%)

Stratification of Manual Picks

Here is the noncoding conservation and gene density of non-overlapping 500 kb regions in the manual picks. The boundaries between strata are:
low 50%  middle 30%  high 20%
------------------------------
gene 0.0-1.9% 1.9-4.2% 4.2-100%
consNotTx 0.0-6.3% 6.3-10.6% 10.6-100%

CFTR

June: chr7:114288355-116165780 Nov: chr7:114288155-116165580

Interleukin_Cluster
June: chr5:130778557-131778556 Nov: chr5:131703638-132703637

Apo_Cluster
June: chr11:118810001-119310000 Nov: chr11:117969240-118469239

Chr22

June: chr22:28500001-30200000 Nov: chr22:28500001-30200000

Chr21
June: chr21:30323762-32019746 Nov: chr21:30406794-32102778

ChrX
June: chrX:147250001-148500000 Nov: chrX:149572309-150846234

Chr19

June: chr19:55200001-56200000 Nov: chr19:54724484-55728861

Alpha_Globin
June: chr16:79138-579137 Nov: chr16:10001-510000

Beta_Globin

June: chr11:5550000-6549999 Nov: chr11:5076527-6078118

HOXA_cluster
June: chr7:26600001-27100000 Nov: chr7:26599801-27099800

IGF2/H19
June: chr11:300001-900000 Nov: chr11:1941933-2547980 PROBLEMATIC REGION - SUBSTANTIALLY REARRANGED BETWEEN JUNE AND NOVEMBER BUILDS

FOXP2

June: chr7:112410791-113410790 Nov: chr7:112410791-113410790

Semi-Manual Picks

Here's the stratification of the other zoo-seq regions. I recommend picking 7q21.13 and 7q31.33 to round things out.

7q21.13
June: chr7:88319137-89433560
Nov: chr7:88318937-89433360

7q21.3
June: chr7:91589227-92559635 Nov: chr7:91589027-92559435

7q21.3
June: chr7:93650712-94868826Nov: chr7:93650512-94868626

7q31.33
June: chr7:124556444-125719632 Nov: chr7:124556244-125719432

7q32.1
June: chr7:126427707-127330661 Nov: chr7:126427507-127330461

Methods

Gene density is defined as percentage of bases covered either by Ensembl genes, or human mRNA best blat alignments in the UCSC browser database.

Nontranscribed transcription was measured by a fairly elaborate process. 125 base non-overlapping subwindows were taken inside of the 500,000 base windows. Subwindows with less than 75% of their bases in a mouse alignment were thrown out. For the remaining subwindows the percentage with at least 80% base identity is used as the conservation score. To get the nontranscribed conservation score the mouse alignments in regions corresponding to Ensembl genes, all genbank mRNA blastz alignments, Fgenesh++ gene predictions, twinScan gene predictions, spliced EST alignments, and repeats were thrown out.