This track shows binding sites for p53, STAT1, c-Myc,
histone modifications H3K4me3 and H3K27me3,
as determined by chromatin immunoprecipitation (ChIP)
and paired-end di-tag (PET) sequencing.
The data for STAT1 are restricted to the ENCODE regions,
but the p53, c-Myc, H3K4me3 and H3K27me3 site data are genome-wide.
The p53 protein is a transcription factor involved in the control
of cell growth that is often expressed at high levels in cancer cells.
STAT1 is a signal transducer and transcription
factor that binds to gamma interferon activating sequence.
The c-Myc (cellular myelocytomatosis) protein is a transcription
factor associated with cell proliferation, differentiation, and
neoplastic disease. H3K4me3 and H3K27me3 are two key histone
modifications tightly associated with chromatin structures.
The PET sequences in this track are derived from individual ChIP fragments
|| 6hrs 5-fluorouracil (5FU)
||gamma interferon (gIFN)
||P493 B cell with tetracycline-repressible c-Myc transgene
||Embryonic stem cell hes3
||Embryonic stem cell hes3
Human embryonic stem cell line hES3 (46XX, Chinese) was obtained from
International. These cells were serially cultured according
to protocols established previously (Choo, 2006).
In brief, feeder-free cultures of hES3 were maintained at 37C/5% CO2
on Matrigel-coated organ culture dishes supplemented with conditioned
media from mouse feeders, DE-MEF.
For the STAT1 experiments,
a total of 4,007 of the PETs from the stimulated cells and
3,180 PETs from unstimulated cells were mapped to the ENCODE regions.
The data from the unstimulated cells were used as the negative control.
Only STAT1 PETs mapped to the ENCODE regions are shown in this track.
Display Conventions and Configuration
In the graphical display, PET sequences are shown as two blocks,
representing the ends of the pair, connected by a thin arrowed
line. Overlapping PET clusters (PET fragments that overlap one
another) originating from the ChIP enrichment process define the
genomic loci that are potential transcription factor binding sites (TFBSs).
PET singletons, from non-specific ChIP fragments that did not cluster, are
In full and packed display modes, the arrowheads on the horizontal line
represent the orientation of the PET sequence, and an ID of the format
XXXXX-M is shown to the left of each PET,
where X is the unique ID for each PET
and M is the number of PET sequences at this location.
The track coloring reflects the value of M:
light gray indicates one or two sequences (score = 333), dark gray is used for
three sequences (score = 800) and black indicates four or more PET sequences
(score = 1000) at the location.
The cross-linked chromatin was sheared and precipitated with a high affinity
The DNA fragments were end-polished and cloned into the plasmid
vector, pGIS3. pGIS3 contains two MmeI recognition sites that
flank the cloning site, which were used to produce a 36 bp PET from the
original ChIP DNA fragments (18 bp from each of the 5' and 3' ends).
Multiple 36 bp PETs were concatenated and cloned into pZero-1 for
sequencing, where each sequence read can generate 10-15 PETs. The PET
sequences were extracted from raw sequence reads and mapped to the genome,
defining the boundaries of each ChIP DNA fragment. The following specific
mapping criteria were used:
- both 5' and 3' signatures must be present on the same chromosome
- their 5' to 3' orientation must be correct
- a minimal 17 bp match must exist for each 18 bp 5' and 3' signature
- the tags must have genomic alignments within 7 Kb of each other
Due to the known possibility of MmeI slippage (+/- 1 bp) that
leads to ambiguities at the PET signature boundaries, a minimal 17 bp
match was set for each 18 bp signature. The total count of PET sequences
mapped to the same locus but with slight nucleotide differences may reflect
the expression level of the transcripts. Only PETs with specific mapping
(one location) to the genome were considered. PETs that mapped to multiple
locations may represent low complexity or repetitive sequences, and therefore
were not included for further analysis.
Statistical and experimental verification exercises have shown that the
overlapping PET clusters result from ChIP enrichment events.
Monte Carlo simulation using the p53 ChIP-PET data estimated that
about 27% of PET-2 clusters (PET clusters with two overlapping
members), 3% of the PET clusters with 3 overlapping members (PET-3
clusters), and less than 0.0001% of PET clusters with more than 3
overlapping members were due to random chance. This suggests that the
PET clusters most likely represent the real enrichment events by ChIP
and that a higher number of overlapping fragments correlates to a
higher probability of a real ChIP enrichment event. Furthermore, based
on goodness-of-fit analysis for assessing the reliability of PET
clusters, it was estimated that less than 36% of the PET-2 clusters
and over 99% of the PET-3+ clusters (clusters with three or more
overlapping members) are true enrichment ChIP sites. Thus, the
verification rate is nearly 100% for PET-3+ ChIP clusters, and the
PET-2 clusters contain significant noise.
In addition to these statistical analyses, 40 genomic locations
identified by PET-3+ clusters were randomly selected and analyzed by
quantitative real-time PCR. The relative enrichment of candidate
regions compared to control GST ChIP DNA was determined and all 40
regions (100%) were confirmed to have significant enrichment of p53
Monte Carlo simulation using the STAT1 ChIP-PET data from interferon
gamma-stimulated dataset estimated that random chance accounted for about
58% of PET-3 clusters (maximal numbers of PETs within the overlap region
of any cluster), 21% of the PET clusters with 4 overlapping members (PET-4
clusters), and less than 0.5% of PET clusters with more than 5 overlapping
members. This suggests that the PET-5+ clusters represent the real enrichment
events by ChIP and that a higher number of overlapping fragments correlates
to a higher probability of a real ChIP enrichment event. Furthermore, based
on goodness-of-fit analysis for assessing the reliability of PET clusters, it
was estimated that less than 30% of the PET-4 clusters and over 90% of the
PET-5+ clusters (clusters with five or more overlapping members) are true
enrichment ChIP sites.
In addition to these statistical analyses, 9 out of 14 genomic locations (64%)
identified by PET-5+ clusters in the ENCODE regions were supported by ChIP-chip
data from Yale using the same ChIP DNA as hybridization material.
Monte Carlo simulation using the c-Myc ChIP-PET data estimated that
about 32% of PET-3 clusters (maximal numbers of PETs within the overlap
region of any cluster) and 4% of the PET clusters with 4 or more overlapping
members (PET-4+ clusters) were due to random chance. This suggests
that ~ 70% of PET-3+ clusters represent the real enrichment events by
ChIP and that a higher number of overlapping fragments correlates to
a higher probability of a real ChIP enrichment event. In addition to these
statistical analyses, 29 genomic locations identified by PET-3+ clusters
and 19 genomic locations defined by PET-2 clusters were randomly selected
and subjected for quantitative real-time PCR analyses. The relative
enrichment of candidate regions compared to control GST ChIP DNA was
determined and all 29 PET-3+ regions (100%) and 19 PET-2 regions (47%) were
confirmed significant enrichment of c-Myc ChIP, indicating that all of the
PET-3+ and 47% of the PET-2 clusters defined regions are true c-Myc bound
H3K4me3 and H3K27me3 hES3
Monte Carlo simulation on these two datasets estimated that about
24% of PET-5 clusters (PET clusters with five overlapping members),
6% of the PET clusters with 6 overlapping members (PET-6 clusters),
and less than 2% of PET clusters with more than 5 overlapping members
were due to random chance. Thus, in conclusion the majority (98.7%) of
overlapping PET-5+ clusters indeed represent the true enrichments from
ChIP processes rather than random events. Therefore, PET clusters size
5 and above are reliable readouts for H3K4me3 and H3K27me3 modification
regions based on Monte Carlo simulation. In addition to these statistical
analyses, 30 genomic locations identified by PET-5+ clusters from each
dataset were randomly selected and analyzed by quantitative real-time
PCR. The relative enrichment of candidate regions compared to control
GST ChIP DNA was determined and all 30 regions (100%) were confirmed
to have significant enrichment (10 fold and more). 9 out of 10 clusters
from PET-4 and PET-3 clusters are enriched with 10 fold and above
compared with control Ena1 ChIP DNA.
The ChIP-PET library and sequence data were produced at the
Genome Institute of Singapore.
The data were mapped
and analyzed by scientists from the Genome Institute of Singapore,
Institute, Singapore, and Boston University.
The STAT1 ChIP fragment prep was provided by Ghia Euskirchen from the
Snyder lab at Yale.
The c-Myc ChIP fragment prep was provided by
Karen Zeller from the
Dang lab at Johns Hopkins University.
Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A,
Wong CH et al.
Gene identification signature (GIS) analysis for
transcriptome characterization and genome annotation.
Nat Methods. 2005 Feb;2(2):105-11.
Wei CL, Wu Q, Vega VB, Chiu KP, Ng P, Zhang T, Shahab A, Yong HC, Fu Y, Weng Z
A Global Map of p53 Transcription-Factor Binding Sites in the
Cell. 2006 Jan 13;124(1):207-19.
Chiu KP, Wong CH, Chen Q, Ariyaratne P, Ooi HS, Wei CL, Sung WK, Ruan Y.
PET-Tool: a software suite for comprehensive processing and
managing of Paired-End diTag (PET) sequence data.
BMC Bioinformatics. 2006 Aug 25;7:390.
Choo A, Padmanabhan J, Chin A, Fong WJ, Oh SKW.
Immortalized feeders for the scale-up of human embryonic stem
cells in feeder and feeder-free conditions.
J Biotechnol. 2006 Mar 9;122(1):130-41.
Zeller KI, Zhao X, Lee CW, Chiu KP, Yao F, Yustein JT, Ooi HS, Orlov YL,
Shahab A, Yong HC et al.
Global mapping of c-Myc binding sites and target gene networks in
human B cells.
Proc Natl Acad Sci U S A. 2006 Nov 21;103(47):17834-9.