MAX Prediction of MAX in kidney Track Settings
 
Virtual ChIPseq predictions of MAX in kidney

Track collection: Publicly available ChIPseq data and predictions for MAX

+  Description
+  All tracks in this collection (34)

Display mode:   

Type of graph:
Track height: pixels (range: 11 to 128)
Data view scaling: Always include zero: 
Vertical viewing range: min:  max:   (range: 0 to 127)
Transform function:Transform data points by: 
Windowing function: Smoothing window:  pixels
Negate values:
Draw y indicator lines:at y = 0.0:    at y =
Graph configuration help
Data schema/format description and download
Assembly: Human Dec. 2013 (GRCh38/hg38)
Data last updated at UCSC: 2018-10-02 10:24:46

Virtual ChIP-seq

Virtual ChIP-seq Predicting transcription factor binding by learning from the transcriptome

Karimzadeh M, Hoffman MM. 2017. Virtual ChIP-seq: Predicting transcription factor binding by learning from the transcriptome. in prep; doi: https://doi.org/. (BibTeX)

The free Virtual ChIP-seq software package efficiently predicts binding of 40 TFs in any cell type with RNA-seq and ATAC-seq (or DNase-seq).

Predicting transcription factor binding

Virtual ChIP-seq uses multi-layer perceptron to predict binding of individual TFs. Virtual ChIP-seq uses data on chromatin accessibility, genomic conservation, and binding characteristics of TFs from previous experiments in other cell types. It also learns from the asso- ciation of gene expression and TF binding at different genomic regions. By incorporating existing ChIP-seq data, there is no longer a need to represent TF sequence preferences in form of position weight matrices. For a new cell type with data on chromatin accessibility and gene expression, Virtual ChIP-seq predicts indirect TF binding, as well as binding of TFs without known sequence preference.

Accuracy of predictions

To build a generalizable classifier that performs well on new cell types with only transcriptome and chromatin accessibility data, we train the multi-layer perceptron on training cell types (A549, GM12878, HCT-116, HepG2, HeLa-S3). We assess the performance of the model in validation cell types (IMR90 K562 MCF-7 NHEK H1 Ishikawa BJ T47D PANC-1 Jurkat). Below, we report median and standard deviation of performance among validation cell types.

TF Median auROC S.D auROC Median auPR S.D auPR Median MCC S.D MCC
BACH1 0.977 0.00923 0.429 0.0508 0.384 0.0923
BHLHE40 0.918 0.00224 0.378 0.0325 0.398 0.0196
BRCA1 0.991 0.00388 0.356 0.0322 0.369 0.0223
CEBPB 0.965 0.0254 0.392 0.0735 0.371 0.042
CHD2 0.98 0.0213 0.462 0.0606 0.451 0.047
CREB1 0.98 0.107 0.519 0.164 0.448 0.109
CTCF 0.989 0.0385 0.81 0.101 0.605 0.15
E2F4 0.993 0.00786 0.502 0.0867 0.322 0.161
EGR1 0.974 0.034 0.418 0.186 0.456 0.176
ELF1 0.954 0.0374 0.496 0.0709 0.455 0.0403
ESRRA 0.939 0.0288 0.308 0.047 0.309 0.0185
FOS 0.858 0.00542 0.334 0.0152 0.369 0.02
FOXA1 0.966 0.0279 0.584 0.0133 0.453 0.0903
GABPA 0.978 0.0272 0.434 0.0605 0.414 0.0533
GATA3 0.916 0.0314 0.241 0.0627 0.312 0.0597
GTF2F1 0.991 0.0123 0.29 0.0709 0.341 0.0624
H2AZ 0.932 0.0728 0.304 0.141 0.317 0.129
HCFC1 0.988 0.00668 0.499 0.0419 0.44 0.0583
JUND 0.992 0.00984 0.319 0.18 0.346 0.142
MAFF 0.964 0.00405 0.361 0.0987 0.374 0.102
MAFK 0.983 0.00458 0.523 0.0958 0.478 0.0398
MAX 0.968 0.0269 0.459 0.115 0.416 0.0645
MAZ 0.987 0.00437 0.546 0.0798 0.455 0.063
MXI1 0.991 0.00456 0.426 0.0318 0.43 0.0305
MYC 0.978 0.114 0.312 0.191 0.319 0.154
NRF1 0.997 0.0127 0.72 0.0508 0.359 0.0593
RAD21 0.986 0.0135 0.75 0.0552 0.581 0.0952
REST 0.985 0.0181 0.562 0.126 0.439 0.0759
RFX5 0.971 0.0138 0.32 0.0461 0.305 0.0536
SIN3A 0.977 0.0095 0.413 0.0399 0.394 0.0384
SMC3 0.998 0.00005 0.779 0.0177 0.723 0.0184
SRF 0.971 0.0355 0.363 0.0833 0.398 0.0584
TAF1 0.992 0.0216 0.541 0.0558 0.484 0.0457
TBP 0.982 0.00548 0.365 0.111 0.387 0.0704
TEAD4 0.947 0.0367 0.392 0.0208 0.352 0.0445
USF1 0.917 0.0223 0.411 0.0858 0.401 0.0785
USF2 0.97 0.0128 0.471 0.0371 0.409 0.0893
YY1 0.93 0.0334 0.46 0.049 0.485 0.0665

Virtual ChIP-seq accepts chromatin accessibility data in narrowPeak format and RNA-seq data in format of a matrix where rows are human gene symbols and columns are cell types (Minimum of 1 column with your cell of interest). The RNA-seq measure must be normalized to length and library (accepts RPKM, FPKM, TPM, but not raw read counts). It takes an average of 6 CPU hours (depending on TF) and a minimum RAM of 8GB to generate the input tables for your TF of interest. Applying the trained model takes less than 20 minutes for most TFs and datasets.

Track hub, file access, and software

UCSC Genome Browser

View the Virtual ChIP-seq track hub in the UCSC genome browser.

There are 40 supertracks corresponding to each transcription factor. Each supertrack contains to bigBed9 files, one showing genomic bins with TF binding in Cistrome DB datasets, and one showing Virtual ChIP-seq predictions in the Roadmap consortium datasets.

Using the track hub

There are 40 supertracks corresponding to each transcription factor. Each supertrack contains to bigBed9 files, one showing genomic bins with TF binding in Cistrome DB datasets, and one showing Virtual ChIP-seq predictions in the Roadmap consortium datasets.

View the Virtual ChIP-seq track hub in UCSC genome browser.

Direct links

Download Virtual ChIP-seq predictions in the Roadmap datasets directly:

Software and documentation

Read the documentation for Virtual ChIP-seq software, which begins with a quick start.

Support

Please ask questions about Virtual ChIP-seq on our mailing list. If you want to report a bug or request a feature, use Virtual ChIP-seq issue tracker. We are interested in all comments on the package, and the ease of use of installation and documentation.

Source code

Credits

Virtual ChIP-seq is developed by Mehran Karimzadeh during his PhD at Michael Hoffman Lab.