Schema for Sanger ChIP Hits - Sanger ChIP-chip Hits and Peak Centers

Home
Genomes
Genome Browser
Tools
Mirrors
- Euro/Asia Mirrors
- Mirroring Instructions
- US Server
- European Server
- Asian Server
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Conditions of Use
- Our History
- Jobs
- Licenses
- Contact Us

field

example

SQL type

info

description

chrom

chr1

varchar(255)

values

Reference sequence chromosome or scaffold

chromStart

147975605

int(10) unsigned

range

Start position in chromosome

chromEnd

147975606

int(10) unsigned

range

End position in chromosome

name

varchar(255)

values

Name of item

chrom

chromStart

chromEnd

name

chr1

147975605

147975606

chr1

147984070

147984071

chr1

148040579

148040580

chr1

148068430

148068431

chr1

148072475

148072476

chr1

148076970

148076971

chr1

148078003

148078004

chr1

148112644

148112645

chr1

148132108

148132109

chr1

148136251

148136252

Description

This track displays hit regions and peak centers for Sanger ChIP-chip data, as identified by hidden Markov model (HMM) analysis.

Display Conventions and Configuration

This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link.

Methods

Data for each replicate was normalized with the Tukey-Biweight Method using R (as recommended by NimbleGen). The log base 2 ratio of the normalized intensities was used for downstream data processing.

A two-state HMM was used to analyze the data. The states of the HMM represent regions of the tile path corresponding to antibody binding locations. State emission probabilities were determined by comparing the cumulative distribution of the experimental data for each replicate on each ENCODE region to a fitted cumulative normal distribution. The fitted distribution was calculated using the Levenberg-Marquart curve-fitting technique and six fitting points ranging from 0.05 to 0.45 of the cumulative distribution. Initial fitting parameters were set from the experimental data. This model is robust through a range of sensible transition probabilities.

Bound regions were identified by finding the optimal state sequence from the HMM using the Viterbi algorithm, and the resulting region data was post-processed to develop the hit list. Hits were defined as contiguous portions of the tile path identified as bound by the HMM. The score of a hit was determined by taking the summation of the median enrichment values of the tiles in the contiguous portions (i.e. the area under the peak). For the purpose of this analysis, hits that were within 1000 base pairs of adjacent hits were combined into hit regions.

The start position of the oligo with the highest enrichment value in the hit region was deemed the center of the peak. The ranking of hits was based on the total score of all hits in a hit region. It is recommended that analysis based on this data use the peak centers expanded to a convenient size for the analysis.

Credits

The ChIP-chip data were generated by Ian Dunham's lab at the Sanger Institute. Contacts: Ian Dunham and Christoph Koch.

The HMM analysis was performed at the EBI by Paul Flicek.

Raw data may be downloaded from the Sanger Institute website at ftp://ftp.sanger.ac.uk/pub/encode.