Schema for Yale TFBS - ENCODE Transcription Factor Binding Sites by ChIP-seq from Yale/UC-Davis/Harvard

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Database: hg18 Primary Table: wgEncodeYaleChIPseqPeaksK562bZnf263V2 Row Count: 27,549 Data last updated: 2009-06-30
Format description: BED6+4 Peaks of signal enrichment based on pooled, normalized (interpreted) data.
On download server: MariaDB table dump directory

field	example	SQL type	info	description
`bin`	591	`smallint(5) unsigned`	range	Indexing field to speed chromosome range queries.
`chrom`	chr1	`varchar(255)`	values	Reference sequence chromosome or scaffold
`chromStart`	791427	`int(10) unsigned`	range	Start position in chromosome
`chromEnd`	792511	`int(10) unsigned`	range	End position in chromosome
`name`	.	`varchar(255)`	values	Name given to a region (preferably unique). Use . if no name is assigned
`score`	1000	`int(10) unsigned`	range	Indicates how dark the peak will be displayed in the browser (0-1000)
`strand`	.	`char(2)`	values	+ or - or . for unknown
`signalValue`	77.5881	`float`	range	Measurement of average enrichment for the region
`pValue`	150.623	`float`	range	Statistical significance of signal value (-log10). Set to -1 if not used.
`qValue`	147.334	`float`	range	Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if not used.
`peak`	805	`int(11)`	range	Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-source called.

Sample Rows

bin	chrom	chromStart	chromEnd	name	score	strand	signalValue	pValue	qValue	peak
591	chr1	791427	792511	.	1000	.	77.5881	150.623	147.334	805
591	chr1	795098	795298	.	515	.	12.6962	7.74365	6.1646	119
591	chr1	827453	827539	.	413	.	10.8153	4.35666	3.23206	2
591	chr1	829817	830055	.	534	.	10.2991	8.33092	6.70422	80
591	chr1	837622	838027	.	474	.	4.1537	6.44146	4.9943	261
591	chr1	846383	846619	.	475	.	5.51452	6.46425	5.01432	74
591	chr1	864590	864605	.	422	.	26.8032	4.69838	3.50956	0
591	chr1	866079	866248	.	416	.	5.01579	4.45435	3.31809	124
591	chr1	870251	870738	.	661	.	6.014	12.2075	10.3274	135
591	chr1	876059	876120	.	424	.	15.5176	4.74615	3.54528	28

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Yale TFBS (wgEncodeYaleChIPseq) Track Description

Description

This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for TF binding (Signal), along with sites that have the greatest evidence of transcription factor binding (Peaks).

The sequence reads, quality scores, and alignment coordinates from these experiments are available for download.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. ENCODE tracks typically contain one or more of the following views:

Peaks: Regions of signal enrichment based on processed data (usually normalized data from pooled replicates). ENCODE Peaks tables contain fields for statistical significance, including FDR (qValue).
Signal: Density graph (wiggle) of signal enrichment based on processed data.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. Further preparations were similar to those previously published (Euskirchen et al., 2007) with the exceptions that the cells were unstimulated and sodium orthovanadate was omitted from the buffers. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et al. (2007) and Rozowsky et al. (2009).

DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.

For each 1 Mb segment of each chromosome a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.

Expression data generated as confirmation of the TFBS data can be found in the Yale Poly-A tracks (coming soon).

Release Notes

Update to Release 4 (Feb 2012): the GM12878/NFKB (IgG-rab) experiments and files have been revoked because the incorrect raw data files were used for generation of the processed data.

This is Release 4 (June 2011) of this track, which includes 2 additional experiments and 2 experiments, K562/NF-YA and K562/NF-YB that were present in earlier releases have been removed.

A number of previously released datasets have been replaced by updated versions. The affected database tables and files include 'V3' in the name, and metadata is marked with "submittedDataVersion=V3", followed by the specific reason. The specific reason is: Includes previously missing sequence data. Previous versions of files are available for download from the FTP site

Credits

These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University; Peggy Farnham at UC Davis; and Kevin Struhl at Harvard. Contact: the Gerstein lab.

References

Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14.

Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 2007 Jun;17(6):898-909.

Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52.

Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007 Aug;4(8):651-7.

Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol. 2009 Jan;27(1):66-75.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.