Caltech RNA-seq Track Settings
 
ENCODE Caltech RNA-seq   (All Expression tracks)

Maximum display mode:       Reset to defaults   
Select views (Help):
Plus Raw Signal ▾       Minus Raw Signal ▾       Raw Signal ▾       Splice Sites       Alignments      
Select subtracks by read type and cell line:

  Replicate: 1 2 3 4
 All Read Type Single 32nt
(1x32)
 
 Single Strand-Specific 75nt
(1x75D)
 
 Paired 75nt
(2x75)
 
Cell Line
GM12878 (Tier 1) 
H1-hESC (Tier 1) 
K562 (Tier 1) 
HeLa-S3 (Tier 2) 
HepG2 (Tier 2) 
HUVEC (Tier 2) 
NHEK 
List subtracks: only selected/visible    all    ()
  Cell Line↓1 Read Type↓2 Replicate↓3 views↓4   Track Name↓5    Restricted Until↓6
 
hide
 Configure
 GM12878  Single 32nt (1x32)  1  Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Raw Signal    Data format   2009-12-06 
 
hide
 Configure
 GM12878  Single 32nt (1x32)  2  Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Raw Signal    Data format   2009-12-06 
 
hide
 GM12878  Single 32nt (1x32)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Splice Aligns    Data format   2009-12-06 
 
hide
 GM12878  Single 32nt (1x32)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Splice Aligns    Data format   2009-12-06 
 
hide
 Configure
 GM12878  Single Strand-Specific 75nt (1x75D)  1  Plus Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Plus Raw Signal    Data format   2010-10-04 
 
hide
 Configure
 GM12878  Single Strand-Specific 75nt (1x75D)  2  Plus Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Plus Raw Signal    Data format   2010-10-06 
 
hide
 Configure
 GM12878  Single Strand-Specific 75nt (1x75D)  1  Minus Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Minus Raw Signal    Data format   2010-10-04 
 
hide
 Configure
 GM12878  Single Strand-Specific 75nt (1x75D)  2  Minus Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Minus Raw Signal    Data format   2010-10-06 
 
hide
 GM12878  Single Strand-Specific 75nt (1x75D)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 BLAT Stranded Splice Aligns    Data format   2010-10-04 
 
hide
 GM12878  Single Strand-Specific 75nt (1x75D)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 BLAT Stranded Splice Aligns    Data format   2010-10-06 
 
hide
 GM12878  Single Strand-Specific 75nt (1x75D)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Splice Aligns    Data format   2010-10-04 
 
hide
 GM12878  Single Strand-Specific 75nt (1x75D)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Splice Aligns    Data format   2010-10-06 
 
hide
 Configure
 GM12878  Paired 75nt (2x75)  2  Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Raw Signal    Data format   2010-10-14 
 
hide
 Configure
 GM12878  Paired 75nt (2x75)  1  Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Raw Signal    Data format   2009-12-06 
 
hide
 Configure
 GM12878  Paired 75nt (2x75)  2  Raw Signal  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Raw Signal    Data format   2009-12-06 
 
hide
 GM12878  Paired 75nt (2x75)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Splice Aligns    Data format   2010-10-14 
 
hide
 GM12878  Paired 75nt (2x75)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 BLAT Splice Aligns    Data format   2009-12-06 
 
hide
 GM12878  Paired 75nt (2x75)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 BLAT Splice Aligns    Data format   2009-12-06 
 
hide
 GM12878  Paired 75nt (2x75)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Splice Aligns    Data format   2009-12-06 
 
hide
 GM12878  Paired 75nt (2x75)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Splice Aligns    Data format   2009-12-06 
 
hide
 Configure
 K562  Single 32nt (1x32)  1  Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Raw Signal    Data format   2009-12-06 
 
hide
 Configure
 K562  Single 32nt (1x32)  2  Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Raw Signal    Data format   2009-12-06 
 
hide
 K562  Single 32nt (1x32)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Splice Aligns    Data format   2009-12-06 
 
hide
 K562  Single 32nt (1x32)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Splice Aligns    Data format   2009-12-06 
 
hide
 Configure
 K562  Single Strand-Specific 75nt (1x75D)  1  Plus Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Plus Raw Signal    Data format   2010-10-06 
 
hide
 Configure
 K562  Single Strand-Specific 75nt (1x75D)  2  Plus Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Plus Raw Signal    Data format   2010-10-06 
 
hide
 Configure
 K562  Single Strand-Specific 75nt (1x75D)  1  Minus Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Minus Raw Signal    Data format   2010-10-06 
 
hide
 Configure
 K562  Single Strand-Specific 75nt (1x75D)  2  Minus Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Minus Raw Signal    Data format   2010-10-06 
 
hide
 K562  Single Strand-Specific 75nt (1x75D)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 BLAT Stranded Splice Aligns    Data format   2010-10-06 
 
hide
 K562  Single Strand-Specific 75nt (1x75D)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 BLAT Stranded Splice Aligns    Data format   2010-10-06 
 
hide
 K562  Single Strand-Specific 75nt (1x75D)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Splice Aligns    Data format   2010-10-06 
 
hide
 K562  Single Strand-Specific 75nt (1x75D)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Splice Aligns    Data format   2010-10-06 
 
hide
 Configure
 K562  Paired 75nt (2x75)  1  Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Raw Signal    Data format   2009-12-06 
 
hide
 Configure
 K562  Paired 75nt (2x75)  2  Raw Signal  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Raw Signal    Data format   2009-12-06 
 
hide
 K562  Paired 75nt (2x75)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 BLAT Splice Aligns    Data format   2009-12-06 
 
hide
 K562  Paired 75nt (2x75)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 BLAT Splice Aligns    Data format   2009-12-06 
 
hide
 K562  Paired 75nt (2x75)  1  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Splice Aligns    Data format   2009-12-06 
 
hide
 K562  Paired 75nt (2x75)  2  Splice Sites  ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Splice Aligns    Data format   2009-12-06 
     Restriction Policy
Assembly: Human Mar. 2006 (NCBI36/hg18)

Description

This track is produced as part of the ENCODE Project. RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing, which was done here on an Illumina Genome Analyzer (GA2) (Mortazavi et al., 2008). The transcriptome measurements shown on these tracks were performed on polyA selected RNA from total cellular RNA. Data have been produced in two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments). Those that don't map to the genome are mapped to known RNA splice junctions (Splice Sites). These mapped reads are then counted to determine their frequency of occurrence at known gene models. Sequence reads that cluster at genome locations that lack an existing transcript model are also identified informatically and they are quantified. RNA-Seq is especially suited for giving information about RNA splicing patterns and for determining unequivocally the presence or absence of lower abundance class RNAs. As performed here, internal RNA standards are used to assist in quantification and to provide internal process controls. This RNA-Seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. The "randomly primed" reverse transcription is, apparently, not fully random. This is inferred from a sequence bias in the first residues of the read population, and this likely contributes to observed unevenness in sequence coverage across transcripts.

These tracks show 1x32 n.t. or 2x75 n.t. or 1x75 n.t. directed sequence reads of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The 32 n.t. sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions using different sequence alignment programs. The 1x75D n.t. reads are strand-specific reads. The 2x75 n.t. reads were mapped serially, first with the Bowtie program (Langmead et al., 2009) against the genome and UCSC known-gene splice junctions (Splice Sites). Bowtie-unmapped reads were then mapped using BLAT to find evidence of novel splicing, by requiring at least 10 bp on the short-side of the splice.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track:

Plus Raw Signal
Density graph (wiggle) of signal enrichment on the positive strand for strand-specific reads based on a normalized aligned read density (RPKM). The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples.
Minus Raw Signal
Density graph (wiggle) of signal enrichment on the negative strand for strand-specific reads based on a normalized aligned read density (RPKM). The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples.
Raw Signal
Density graph (wiggle) of signal enrichment based on a normalized aligned read density (RPKM) for non strand-specific reads. The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples.
Splice Sites
RNA-seq tags aligning to mRNA splice sites.
Alignments
The Alignments view shows reads mapped to the genome. Alignments are colored by cell type.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. The cells (either 2 X 107 or 4 X 107 cells — GM12878 and K562, and 8 X 107 cells HepG2) were lysed in either 4mls (GM12878 and K562) or 12 mls (HepG2) of RLT buffer (Qiagen RNEasy kit), and processed on either 2 (GM12878 and K562) or 3 (HepG2) RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNAse digestion step to remove residual genomic DNA.

75 µgs of total RNA was selected twice with oligodT beads (Dynal) according to the manufacturer's protocol to isolate mRNA from each of the preparations. 100 ngs of mRNA was then processed according to the protocol in Mortazavi et al (2008), and prepared for sequencing on the Genome Analyzer flow cell according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina).

Following alignment of the sequence reads to the genome assembly as described above, the sequence reads were further analyzed using the ERANGE 3.0 software package, which quantifies the number of reads falling within the mapped boundaries of known transcripts from the Gencode annotations. ERANGE assigns both genomically unique reads and reads that occur in 2-10 genomic locations for quantification. ERANGE also contains a subroutine (RNAFAR) which allows the consolidation of reads that align close to, but outside the mapped borders of known transcripts, and the identification of novel transcribed regions of the genome using either a 20 kb radius for the 1x32 datasets or paired-end information for 2x75 datasets.

For 2x75 datasets, raw Illumina reads (RawData files on the download page, fasta format) are run through bowtie 0.9.8.1 with up to 2 mismatches and the resulting mappings are stored (RawData2 files, bowtie format) for up to ten matches per-read to the genome, spiked controls and UCSC knownGene splice junctions. Reads that were not mapped by bowtie (RawData3 files, fasta format) are then mapped onto the genome using blat and filtered using pslReps (RawData4 files, psl format). The bowtie and blat mappings are then analyzed by ERANGE3.0.2 to generate wiggles (RawSignal view, wiggle format), bed files of all reads and splices (Alignments and Paired Alignments views, bed format), all bowtie and blat splices (Splice Sites view, bed format) and blat-only splices (Splice Sites view, bed format), as well as RPKM expression level measurements at the gene-level (RawData5 files, rpkm format), exon-level (RawData6 files, rpkm format), and candidate novel exons (RawData7 files, rpkm format). Fasta files for splice sites (hg18splice75.fa.gz) and spikes (spikes.fa.gz) can be found on the downloads page.

Verification

  • Known exon maps as displayed on the genome browser are confirmed by the alignment of sequence reads.
  • Known spliced exons are detected at the expected frequency for transcripts of given abundance.
  • Linear range detection of spiked in RNA transcripts from Arabidopsis and phage lambda over 5 orders of magnitude.
  • Endpoint RTPCR confirms presence of selected RNAFAR 3′UTR extensions.
  • Correlation to published microarray data r = 0.62

Release Notes

This is release 2 of the Caltech RNA-seq track. This release adds five new cell types: H1-hESC, HeLa-S3, HepG2, HUVEC, and NHEK. Also, stranded 75 nt reads are now provided for each cell type.

Credits

Wold Group: Ali Mortazavi, Brian Williams, Diane Trout, Brandon King, Ken McCue, Lorian Schaeffer.

Myers Group: Norma Neff, Florencia Pauli, Fan Zhang, Tim Reddy, Rami Rauch.

Illumina gene expression group: Gary Schroth, Shujun Luo, Eric Vermaas.

Contacts: Diane Trout (informatics) and Brian Williams (experimental).

References

Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold BJ. Mapping and quantifying mammalian transcriptomes by RNA-Seq Nature Methods. 2008 Jul; 5(7):621-628.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biology. 2009 Mar; 10:R25.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.