Mappability Downloadable Files
  Mappability or Uniqueness of Reference Genome from ENCODE   (Track settings)
Additional resources:
• files.txt - lists the name and metadata for each download.
• md5sum.txt - lists the md5sum output for each download.
• downloads server - alternative access to downloadable files (may include obsolete data).
Filter files by: (select multiple categories and items - help)
PI:
  Lab:
  View:
 
 10 files PI Lab View Window size UCSC Accession Size File Type Additional Details
BirneyBatzoglou & Sidow - StanfordExcludable  wgEncodeEH0014324.6 KBbeddataVersion=ENCODE Mar 2012 Freeze; dateSubmitted=2011-05-04; subId=4039; labVersion=consensus: Duke Excluded and DAC UltraHighSignal blacklist; tableName=wgEncodeDacMapabilityConsensusExcludable; md5sum=895f1dd97b001a338eb25d665c23d571;
CrawfordCrawford - Duke UniversityExcludable  wgEncodeEH000322 17 KBbeddataVersion=ENCODE Mar 2012 Freeze; dateSubmitted=2011-03-28; subId=3840; labVersion=satellite_rna_chrM_500.bed.20080925; tableName=wgEncodeDukeMapabilityRegionsExcludable; md5sum=6a305b0e7c33700fd11fecbbaf62548a;
CrawfordCrawford - Duke UniversityUniqueness20merwgEncodeEH0003231.4 GBbigWigdataVersion=ENCODE Mar 2012 Freeze; dateSubmitted=2011-03-28; subId=3840; labVersion=1.0 - 4 or less; tableName=wgEncodeDukeMapabilityUniqueness20bp; md5sum=829c23fa7e5b351ffd85c03a904728ae;
CrawfordCrawford - Duke UniversityUniqueness35merwgEncodeEH000325929 MBbigWigdataVersion=ENCODE Mar 2012 Freeze; dateSubmitted=2011-03-28; subId=3840; labVersion=1.0 - 4 or less; tableName=wgEncodeDukeMapabilityUniqueness35bp; md5sum=1d15ddafe2c8df51cf08495db96679e7;
GingerasGuigo - CGR, BarcelonaAlignability100merwgEncodeEH000317 99 MBbigWigdataVersion=January 2010; subId=4945; uniqueness=no more than 2 mismatches; tableName=wgEncodeCrgMapabilityAlign100mer; md5sum=a1b1a8c99431fedf6a3b4baef028cca4;
GingerasGuigo - CGR, BarcelonaAlignability24merwgEncodeEH0006085.0 GBbigWigdataVersion=April 2010; subId=4945; uniqueness=no more than 2 mismatches; tableName=wgEncodeCrgMapabilityAlign24mer; md5sum=ba1c4bb9fb079aa86094572d5a827bb7;
GingerasGuigo - CGR, BarcelonaAlignability36merwgEncodeEH0003181.4 GBbigWigdataVersion=January 2010; subId=4945; uniqueness=no more than 2 mismatches; tableName=wgEncodeCrgMapabilityAlign36mer; md5sum=debdda148b79a5b5d22ebd13caaf069c;
GingerasGuigo - CGR, BarcelonaAlignability40merwgEncodeEH0003191.2 GBbigWigdataVersion=January 2010; subId=4945; uniqueness=no more than 2 mismatches; tableName=wgEncodeCrgMapabilityAlign40mer; md5sum=97d55005d426e4e4a7c304a7a0f3b2a8;
GingerasGuigo - CGR, BarcelonaAlignability50merwgEncodeEH000320706 MBbigWigdataVersion=January 2010; subId=4945; uniqueness=no more than 2 mismatches; tableName=wgEncodeCrgMapabilityAlign50mer; md5sum=2e73d575594ad4aa3b66e2b436930b8d;
GingerasGuigo - CGR, BarcelonaAlignability75merwgEncodeEH000321281 MBbigWigdataVersion=January 2010; subId=4945; uniqueness=no more than 2 mismatches; tableName=wgEncodeCrgMapabilityAlign75mer; md5sum=82bbdf76bf5f1df5e73dcbf4480e0f43;
    10 files

Description

These tracks display the level of sequence uniqueness of the reference GRCh37/hg19 genome assembly. They were generated using different window sizes, and high signal will be found in areas where the sequence is unique.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types separated as separate (views). For each view, there are multiple subtracks representing different sequence lengths or methods of preparation. Instructions for configuring multi-view tracks are here. Mappability tracks consist of the following views:

Alignability
These tracks provide a measure of how often the sequence found at the particular location will align within the whole genome. Unlike measures of uniqueness, alignability will tolerate up to 2 mismatches. These tracks are in the form of signals ranging from 0 to 1 and have several configuration options.

Uniqueness
These tracks are a direct measure of sequence uniqueness throughout the reference genome. These tracks are in the form of signals ranging from 0 to 1 and have several configuration options.

Blacklisted Regions
Both tracks of blacklisted regions attempt to identify regions of the reference genome which are troublesome for high throughput sequencing aligners. Troubled regions may be due to repetitive elements or other anomalies. Each track contains a set of regions of varying length with no special configuration options.

Methods

Alignability

The CRG Alignability tracks display how uniquely k-mer sequences align to a region of the genome. To generate the data, the GEM-mappability program has been employed. The method is equivalent to mapping sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts to produce these tracks) back to the genome using the GEM mapper aligner (up to 2 mismatches were allowed in this case). For each window, a mappability score was computed (S = 1/(number of matches found in the genome): S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on). The CRG Alignability tracks were generated independently of the ENCODE project, in the framework of the GEM (GEnome Multitool) project.

Uniqueness

The Duke Uniqueness tracks display how unique each sequence is on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1, with 1 representing a completely unique sequence and 0 representing a sequence that occurs more than 4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke Uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin: DNaseI HS, FAIRE, TFBS and Synthesis tracks.

Blacklisted Regions

The DAC Blacklisted Regions aim to identify a comprehensive set of regions in the human genome that have anomalous, unstructured, high signal/read counts in next gen sequencing experiments independent of cell line and type of experiment. There were 80 open chromatin tracks (DNase and FAIRE datasets) and 20 ChIP-seq input/control tracks spanning ~60 human tissue types/cell lines in total used to identify these regions with signal artifacts. These regions tend to have a very high ratio of multi-mapping to unique mapping reads and high variance in mappability. Some of these regions overlap pathological repeat elements such as satellite, centromeric and telomeric repeats. However, simple mappability based filters do not account for most of these regions. Hence, it is recommended to use this blacklist alongside mappability filters. The DAC Blacklisted Regions track was generated for the ENCODE project.

The Duke Excluded Regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Open Chromatin: DNaseI HS and FAIRE tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke Excluded Regions track was generated for the ENCODE project.

Release Notes

This is Release 3 (October 2011) of this track, which now includes the DAC Blacklisted regions, Duke Uniqueness and Duke Excluded regions.

Credits

The CRG Alignability track was created by Thomas Derrien and Paolo Ribeca in Roderic Guigo's lab at the Centre for Genomic Regulation (CRG), Barcelona, Spain. Thomas Derrien was supported by funds from NHGRI for the ENCODE project, while Paolo Ribeca was funded by a Consolider grant CDS2007-00050 from the Spanish Ministerio de Educación y Ciencia.

The Duke Uniqueness and Duke Excluded Regions tracks were created by Terry Furey and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the University of Cambridge, Department of Oncology and CR-UK Cambridge Research Institute (CRI). We thank NHGRI for ENCODE funding support.

The DAC Blacklisted Regions were created by Anshul Kundaje at Stanford University in the labs of Batzoglou and Sidow and in cooperation with Ewan Birney at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support. (Contact: Anshul Kundaje).

References

Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377.

Data Release Policy

Data users may freely use all data in this track. ENCODE labs that contributed annotations have exempted the data displayed here from the ENCODE data release policy restrictions.