Schema for Sanger ChIP Hits - Sanger ChIP-chip Hits and Peak Centers
  Database: hg17    Primary Table: encodeSangerChipCenterH4acGM06990    Row Count: 1,236   Data last updated: 2005-10-26
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 147975605int(10) unsigned range Start position in chromosome
chromEnd 147975606int(10) unsigned range End position in chromosome
name 1varchar(255) values Name of item

Sample Rows
 
chromchromStartchromEndname
chr11479756051479756061
chr11479840701479840711
chr11480405791480405801
chr11480684301480684311
chr11480724751480724761
chr11480769701480769711
chr11480780031480780041
chr11481126441481126451
chr11481321081481321091
chr11481362511481362521

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Sanger ChIP Hits (encodeSangerChipHits) Track Description
 

Description

This track displays hit regions and peak centers for Sanger ChIP-chip data, as identified by hidden Markov model (HMM) analysis.

Display Conventions and Configuration

This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link.

Methods

Data for each replicate was normalized with the Tukey-Biweight Method using R (as recommended by NimbleGen). The log base 2 ratio of the normalized intensities was used for downstream data processing.

A two-state HMM was used to analyze the data. The states of the HMM represent regions of the tile path corresponding to antibody binding locations. State emission probabilities were determined by comparing the cumulative distribution of the experimental data for each replicate on each ENCODE region to a fitted cumulative normal distribution. The fitted distribution was calculated using the Levenberg-Marquart curve-fitting technique and six fitting points ranging from 0.05 to 0.45 of the cumulative distribution. Initial fitting parameters were set from the experimental data. This model is robust through a range of sensible transition probabilities.

Bound regions were identified by finding the optimal state sequence from the HMM using the Viterbi algorithm, and the resulting region data was post-processed to develop the hit list. Hits were defined as contiguous portions of the tile path identified as bound by the HMM. The score of a hit was determined by taking the summation of the median enrichment values of the tiles in the contiguous portions (i.e. the area under the peak). For the purpose of this analysis, hits that were within 1000 base pairs of adjacent hits were combined into hit regions.

The start position of the oligo with the highest enrichment value in the hit region was deemed the center of the peak. The ranking of hits was based on the total score of all hits in a hit region. It is recommended that analysis based on this data use the peak centers expanded to a convenient size for the analysis.

Credits

The ChIP-chip data were generated by Ian Dunham's lab at the Sanger Institute. Contacts: Ian Dunham and Christoph Koch.

The HMM analysis was performed at the EBI by Paul Flicek.

Raw data may be downloaded from the Sanger Institute website at ftp://ftp.sanger.ac.uk/pub/encode.