GIS DNA PET Track Settings
 
ENCODE Genome Institute of Singapore DNA Paired-End Ditags   (All Variation and Repeats tracks)

Maximum display mode:       Reset to defaults   
Select views (Help):
Alignments      
Select subtracks by cell line and frag size:
 All Cell Line GM12878 (Tier 1)  K562 (Tier 1) 
Frag Size
0-1k 
1-5k 
2-10k 
3-20k 
List subtracks: only selected/visible    all    ()
  Cell Line↓1 Frag Size↓2 views↓3   Track Name↓4    Restricted Until↓5
 
hide
 GM12878  0-1k  Alignments  ENCODE GIS DNA PET Alignments (1k frags in GM12878 cells)    Data format   2010-01-03 
 
hide
 GM12878  1-5k  Alignments  ENCODE GIS DNA PET Alignments (5k frags in GM12878 cells)    Data format   2010-01-03 
 
hide
 GM12878  2-10k  Alignments  ENCODE GIS DNA PET Alignments (10k frags in GM12878 cells)    Data format   2009-12-11 
 
hide
 K562  0-1k  Alignments  ENCODE GIS DNA PET Alignments (1k frags in K562 cells)    Data format   2009-12-11 
 
hide
 K562  2-10k  Alignments  ENCODE GIS DNA PET Alignments (10k frags in K562 cells)    Data format   2009-12-11 
 
hide
 K562  3-20k  Alignments  ENCODE GIS DNA PET Alignments (20k frags in K562 cells)    Data format   2011-01-05 
     Restriction Policy
Assembly: Human Mar. 2006 (NCBI36/hg18)

Description

This track is produced as part of the ENCODE Transcriptome Project. It shows the starts and ends of DNA fragments from different cell lines determined by paired-end ditag (PET) sequencing using different DNA fragment sizes for analysis of genome structural variation.

Display Conventions and Configuration

In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the strand, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. PETs that mapped to multiple locations may represent low complexity or repetitive sequences.

To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide.

Alignments
The Alignments view shows alignment of individual PET sequences.

Methods

Sample genomic DNA was isolated, hydrosheared at a given size-range, then ligated with specific DNA linker sequence at both ends, followed by gel-selection of the desired size, e.g., 1 kb, 10 kb, etc. respectively. The DNA fragments modified with linker at both ends (e.g., 10 kb) were then circularized by ligation, followed by restriction digest with enzyme EcoP15I to generate DNA-PETs (25-bp tag from each end). The PETs were ligated with SOLiD sequencing adaptors at both ends, then amplified by PCR and purified as complex templates for high throughput DNA sequencing. The current DNA-PET data sets submitted are mostly generated by SOLiD platform. Cells were grown according to the approved ENCODE cell culture protocols.

Data: Reads of DNA-PETs were mapped onto reference genome, NCBI Build36, hg18. A majority of the PETs mapped on the same chromosome in correct orientations and within expected distance span (e.g., a 10 kb DNA-PET was expected mapping on ~10 kb span distance). A small portion of misaligned PETs, called discordant PETs, mapped either too far from each other, had wrong orientations, or in different chromosomes indicating various genome structure or variations observed between the sample and the reference genome. The variations could be due to deletion, inversion, tandem repeats, trans-location, fusion etc.

Mapping parameters: Mapping was done using Applied Biosystems' SOLiD alignment and pairing pipeline. Initial mapping was done allowing two mismatches in color space and recovery was performed during pairing that allowed up to 4 mismatches in a pair.

Verification

Representative structural variations identified by DNA-PET data have been verified by targeted PCR and sequencing analysis to confirm the predicted rearrangement sites. Some of them have also been validated by FISH.

Credits

The GIS-DNA PET libraries and sequence data for genome structural variation analysis were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists Xiaoan Ruan, Atif Shahab, Chialin Wei, and Yijun Ruan at the Genome Institute of Singapore.

Contact: Yijun Ruan

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.