GIS ChIP-PET Track Settings
 
GIS ChIP-PET   (All Regulation tracks)

Display mode:       Reset to defaults
List subtracks: only selected/visible    all    ()  
hide
 p53 HCT116 +5FU  GIS ChIP-PET: p53 Ab on 5FU treated HCT116 cells   Schema 
hide
 STAT1 HeLa +gIF  GIS ChIP-PET: STAT1 Ab on gIF treated HeLa cells   Schema 
hide
 STAT1 HeLa -gIF  GIS ChIP-PET: STAT1 Ab on untreated HeLa cells   Schema 
hide
 cMyc P493  GIS ChIP-PET: c-Myc Ab on P493 B cells   Schema 
hide
 H3K4me3 hES3  GIS ChIP-PET: H3K4me3 Ab on ES hes-3 cells   Schema 
hide
 H3K27me3 hES3  GIS ChIP-PET: H3K27me3 Ab on ES hes-3 cells   Schema 
    
Source data version: ENCODE June 2005 Freeze, ENCODE Oct 2005 Freeze, Oct 2006, Aug 2007

Description

This track shows binding sites for p53, STAT1, c-Myc, histone modifications H3K4me3 and H3K27me3, as determined by chromatin immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing. The data for STAT1 are restricted to the ENCODE regions, but the p53, c-Myc, H3K4me3 and H3K27me3 site data are genome-wide.

The p53 protein is a transcription factor involved in the control of cell growth that is often expressed at high levels in cancer cells. STAT1 is a signal transducer and transcription factor that binds to gamma interferon activating sequence. The c-Myc (cellular myelocytomatosis) protein is a transcription factor associated with cell proliferation, differentiation, and neoplastic disease. H3K4me3 and H3K27me3 are two key histone modifications tightly associated with chromatin structures.

The PET sequences in this track are derived from individual ChIP fragments as follows:

FactorFragmentsCell lineTreatment
p53 65,572 HCT116 6hrs 5-fluorouracil (5FU)
STAT1 263,901 HeLa none
STAT1 327,838 HeLa gamma interferon (gIFN)
c-Myc 273,566 P493 B cell with tetracycline-repressible c-Myc transgene none
H3K4me3 679,752 Embryonic stem cell hes3 none
H3K27me3 992,509 Embryonic stem cell hes3 none

Human embryonic stem cell line hES3 (46XX, Chinese) was obtained from ES Cell International. These cells were serially cultured according to protocols established previously (Choo, 2006). In brief, feeder-free cultures of hES3 were maintained at 37C/5% CO2 on Matrigel-coated organ culture dishes supplemented with conditioned media from mouse feeders, DE-MEF.

For the STAT1 experiments, a total of 4,007 of the PETs from the stimulated cells and 3,180 PETs from unstimulated cells were mapped to the ENCODE regions. The data from the unstimulated cells were used as the negative control. Only STAT1 PETs mapped to the ENCODE regions are shown in this track.

Display Conventions and Configuration

In the graphical display, PET sequences are shown as two blocks, representing the ends of the pair, connected by a thin arrowed line. Overlapping PET clusters (PET fragments that overlap one another) originating from the ChIP enrichment process define the genomic loci that are potential transcription factor binding sites (TFBSs). PET singletons, from non-specific ChIP fragments that did not cluster, are not shown.

In full and packed display modes, the arrowheads on the horizontal line represent the orientation of the PET sequence, and an ID of the format XXXXX-M is shown to the left of each PET, where X is the unique ID for each PET and M is the number of PET sequences at this location. The track coloring reflects the value of M: light gray indicates one or two sequences (score = 333), dark gray is used for three sequences (score = 800) and black indicates four or more PET sequences (score = 1000) at the location.

Methods

The cross-linked chromatin was sheared and precipitated with a high affinity antibody. The DNA fragments were end-polished and cloned into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET from the original ChIP DNA fragments (18 bp from each of the 5' and 3' ends). Multiple 36 bp PETs were concatenated and cloned into pZero-1 for sequencing, where each sequence read can generate 10-15 PETs. The PET sequences were extracted from raw sequence reads and mapped to the genome, defining the boundaries of each ChIP DNA fragment. The following specific mapping criteria were used:

  • both 5' and 3' signatures must be present on the same chromosome
  • their 5' to 3' orientation must be correct
  • a minimal 17 bp match must exist for each 18 bp 5' and 3' signature
  • the tags must have genomic alignments within 7 Kb of each other

Due to the known possibility of MmeI slippage (+/- 1 bp) that leads to ambiguities at the PET signature boundaries, a minimal 17 bp match was set for each 18 bp signature. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. Only PETs with specific mapping (one location) to the genome were considered. PETs that mapped to multiple locations may represent low complexity or repetitive sequences, and therefore were not included for further analysis.

Verification

Statistical and experimental verification exercises have shown that the overlapping PET clusters result from ChIP enrichment events.

P53 HCT116

Monte Carlo simulation using the p53 ChIP-PET data estimated that about 27% of PET-2 clusters (PET clusters with two overlapping members), 3% of the PET clusters with 3 overlapping members (PET-3 clusters), and less than 0.0001% of PET clusters with more than 3 overlapping members were due to random chance. This suggests that the PET clusters most likely represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 36% of the PET-2 clusters and over 99% of the PET-3+ clusters (clusters with three or more overlapping members) are true enrichment ChIP sites. Thus, the verification rate is nearly 100% for PET-3+ ChIP clusters, and the PET-2 clusters contain significant noise.

In addition to these statistical analyses, 40 genomic locations identified by PET-3+ clusters were randomly selected and analyzed by quantitative real-time PCR. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 40 regions (100%) were confirmed to have significant enrichment of p53 ChIP clusters.

STAT1 HeLa

Monte Carlo simulation using the STAT1 ChIP-PET data from interferon gamma-stimulated dataset estimated that random chance accounted for about 58% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster), 21% of the PET clusters with 4 overlapping members (PET-4 clusters), and less than 0.5% of PET clusters with more than 5 overlapping members. This suggests that the PET-5+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 30% of the PET-4 clusters and over 90% of the PET-5+ clusters (clusters with five or more overlapping members) are true enrichment ChIP sites.

In addition to these statistical analyses, 9 out of 14 genomic locations (64%) identified by PET-5+ clusters in the ENCODE regions were supported by ChIP-chip data from Yale using the same ChIP DNA as hybridization material.

c-Myc P493

Monte Carlo simulation using the c-Myc ChIP-PET data estimated that about 32% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster) and 4% of the PET clusters with 4 or more overlapping members (PET-4+ clusters) were due to random chance. This suggests that ~ 70% of PET-3+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. In addition to these statistical analyses, 29 genomic locations identified by PET-3+ clusters and 19 genomic locations defined by PET-2 clusters were randomly selected and subjected for quantitative real-time PCR analyses. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 29 PET-3+ regions (100%) and 19 PET-2 regions (47%) were confirmed significant enrichment of c-Myc ChIP, indicating that all of the PET-3+ and 47% of the PET-2 clusters defined regions are true c-Myc bound targets.

H3K4me3 and H3K27me3 hES3

Monte Carlo simulation on these two datasets estimated that about 24% of PET-5 clusters (PET clusters with five overlapping members), 6% of the PET clusters with 6 overlapping members (PET-6 clusters), and less than 2% of PET clusters with more than 5 overlapping members were due to random chance. Thus, in conclusion the majority (98.7%) of overlapping PET-5+ clusters indeed represent the true enrichments from ChIP processes rather than random events. Therefore, PET clusters size 5 and above are reliable readouts for H3K4me3 and H3K27me3 modification regions based on Monte Carlo simulation. In addition to these statistical analyses, 30 genomic locations identified by PET-5+ clusters from each dataset were randomly selected and analyzed by quantitative real-time PCR. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 30 regions (100%) were confirmed to have significant enrichment (10 fold and more). 9 out of 10 clusters from PET-4 and PET-3 clusters are enriched with 10 fold and above compared with control Ena1 ChIP DNA.

Credits

The ChIP-PET library and sequence data were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore, the Bioinformatics Institute, Singapore, and Boston University.

The STAT1 ChIP fragment prep was provided by Ghia Euskirchen from the Snyder lab at Yale. The c-Myc ChIP fragment prep was provided by Karen Zeller from the Dang lab at Johns Hopkins University.

References

Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005 Feb;2(2):105-11.

Wei CL, Wu Q, Vega VB, Chiu KP, Ng P, Zhang T, Shahab A, Yong HC, Fu Y, Weng Z et al. A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome. Cell. 2006 Jan 13;124(1):207-19.

Chiu KP, Wong CH, Chen Q, Ariyaratne P, Ooi HS, Wei CL, Sung WK, Ruan Y. PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data. BMC Bioinformatics. 2006 Aug 25;7:390.

Choo A, Padmanabhan J, Chin A, Fong WJ, Oh SKW. Immortalized feeders for the scale-up of human embryonic stem cells in feeder and feeder-free conditions. J Biotechnol. 2006 Mar 9;122(1):130-41.

Zeller KI, Zhao X, Lee CW, Chiu KP, Yao F, Yustein JT, Ooi HS, Orlov YL, Shahab A, Yong HC et al. Global mapping of c-Myc binding sites and target gene networks in human B cells. Proc Natl Acad Sci U S A. 2006 Nov 21;103(47):17834-9.