GIS ChIP-PET Track Settings
 
GIS ChIP-PET   (All ENCODE Chromatin Immunoprecipitation tracks)

Display mode:       Reset to defaults
List subtracks: only selected/visible    all  
hide
 p53 HCT116 +5FU  GIS ChIP-PET: p53 Ab on 5FU treated HCT116 cells   Data format 
hide
 STAT1 HeLa +gIF  GIS ChIP-PET: STAT1 Ab on gIF treated HeLa cells   Data format 
hide
 STAT1 HeLa -gIF  GIS ChIP-PET: STAT1 Ab on untreated HeLa cells   Data format 
hide
 cMyc P493  GIS ChIP-PET: c-Myc Ab on P493 B cells   Data format 
Source data version: ENCODE June 2005 Freeze, ENCODE Oct 2005 Freeze, Oct 2006
Assembly: Human May 2004 (NCBI35/hg17)

Description

This track shows binding sites for p53, STAT1, and c-Myc, as determined by chromatin immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing. The p53 and c-Myc site data is genome-wide; data for STAT1 is restricted to the ENCODE regions.

The p53 protein is a transcription factor involved in the control of cell growth that is often expressed at high levels in cancer cells. STAT1 is a signal transducer and transcription factor that binds to gamma interferon activating sequence. The c-Myc (cellular myelocytomatosis) protein is a transcription factor associated with cell proliferation, differentiation, and neoplastic disease.

The PET sequences in this track are derived from individual ChIP fragments as follows:

FactorFragmentsCell lineTreatment
p53 65,572 HCT116 6hrs 5-fluorouracil (5FU)
STAT1 263,901 HeLa none
STAT1 327,838 HeLa gamma interferon (gIFN)
c-Myc 273,566 P493 B cell with tetracycline-repressible c-Myc transgene none

For the STAT1 experiments, a total of 4,007 of the PETs from the stimulated cells and 3,180 PETs from unstimulated cells were mapped to the ENCODE regions. The data from the unstimulated cells were used as the negative control. Only STAT1 PETs mapped to the ENCODE regions are shown in this track.

Display Conventions and Configuration

In the graphical display, PET sequences are shown as two blocks, representing the ends of the pair, connected by a thin arrowed line. Overlapping PET clusters (PET fragments that overlap one another) originating from the ChIP enrichment process define the genomic loci that are potential transcription factor binding sites (TFBSs). PET singletons, from non-specific ChIP fragments that did not cluster, are not shown.

In full and packed display modes, the arrowheads on the horizontal line represent the orientation of the PET sequence, and an ID of the format XXXXX-M is shown to the left of each PET, where X is the unique ID for each PET and M is the number of PET sequences at this location. The track coloring reflects the value of M: light gray indicates one or two sequences (score = 333), dark gray is used for three sequences (score = 800) and black indicates four or more PET sequences (score = 1000) at the location.

Methods

The cross-linked chromatin was sheared and precipitated with a high affinity antibody. The DNA fragments were end-polished and cloned into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET from the original ChIP DNA fragments (18 bp from each of the 5' and 3' ends). Multiple 36 bp PETs were concatenated and cloned into pZero-1 for sequencing, where each sequence read can generate 10-15 PETs. The PET sequences were extracted from raw sequence reads and mapped to the genome, defining the boundaries of each ChIP DNA fragment. The following specific mapping criteria were used:

  • both 5' and 3' signatures must be present on the same chromosome
  • their 5' to 3' orientation must be correct
  • a minimal 17 bp match must exist for each 18 bp 5' and 3' signature
  • the tags must have genomic alignments within 7 Kb of each other

Due to the known possibility of MmeI slippage (+/- 1 bp) that leads to ambiguities at the PET signature boundaries, a minimal 17 bp match was set for each 18 bp signature. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. Only PETs with specific mapping (one location) to the genome were considered. PETs that mapped to multiple locations may represent low complexity or repetitive sequences, and therefore were not included for further analysis.

Verification

Statistical and experimental verification exercises have shown that the overlapping PET clusters result from ChIP enrichment events.

P53 HCT116

Monte Carlo simulation using the p53 ChIP-PET data estimated that about 27% of PET-2 clusters (PET clusters with two overlapping members), 3% of the PET clusters with 3 overlapping members (PET-3 clusters), and less than 0.0001% of PET clusters with more than 3 overlapping members were due to random chance. This suggests that the PET clusters most likely represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 36% of the PET-2 clusters and over 99% of the PET-3+ clusters (clusters with three or more overlapping members) are true enrichment ChIP sites. Thus, the verification rate is nearly 100% for PET-3+ ChIP clusters, and the PET-2 clusters contain significant noise.

In addition to these statistical analyses, 40 genomic locations identified by PET-3+ clusters were randomly selected and analyzed by quantitative real-time PCR. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 40 regions (100%) were confirmed to have significant enrichment of p53 ChIP clusters.

STAT1 HeLa

Monte Carlo simulation using the STAT1 ChIP-PET data from interferon gamma-stimulated dataset estimated that random chance accounted for about 58% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster), 21% of the PET clusters with 4 overlapping members (PET-4 clusters), and less than 0.5% of PET clusters with more than 5 overlapping members. This suggests that the PET-5+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 30% of the PET-4 clusters and over 90% of the PET-5+ clusters (clusters with five or more overlapping members) are true enrichment ChIP sites.

In addition to these statistical analyses, 9 out of 14 genomic locations (64%) identified by PET-5+ clusters in the ENCODE regions were supported by ChIP-chip data from Yale using the same ChIP DNA as hybridization material.

c-Myc P493

Monte Carlo simulation using the c-Myc ChIP-PET data estimated that about 32% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster) and 4% of the PET clusters with 4 or more overlapping members (PET-4+ clusters) were due to random chance. This suggests that ~ 70% of PET-3+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. In addition to these statistical analyses, 29 genomic locations identified by PET-3+ clusters and 19 genomic locations defined by PET-2 clusters were randomly selected and subjected for quantitative real-time PCR analyses. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 29 PET-3+ regions (100%) and 19 PET-2 regions (47%) were confirmed significant enrichment of c-Myc ChIP, indicating that all of the PET-3+ and 47% of the PET-2 clusters defined regions are true c-Myc bound targets.

Credits

The ChIP-PET library and sequence data were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore, the Bioinformatics Institute, Singapore, and Boston University.

The STAT1 ChIP fragment prep was provided by Ghia Euskirchen from the Snyder lab at Yale. The c-Myc ChIP fragment prep was provided by Karen Zeller from the Dang lab at Johns Hopkins University.

References

Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005 Feb;2(2):105-11.

Wei CL, Wu Q, Vega VB, Chiu KP, Ng P, Zhang T, Shahab A, Yong HC, Fu Y, Weng Z et al. A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome. Cell 2006 Jan 13;124(1):207-19.

Chiu KP, Wong CH, Chen Q, Ariyaratne P, Ooi HS, Wei CL, Sung WK, Ruan Y. PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data. BMC Bioinformatics. 2006 Aug 25;7:390.

Zeller KI, Zhao X, Lee CW, Chiu KP, Yao F, Yustein JT, Ooi HS, Orlov YL, Shahab A, Yong HC et al. Global mapping of c-Myc binding sites and target gene networks in human B cells. Proc Natl Acad Sci U S A. 2006 Nov 21;103(47):17834-9.