RNA sequencing, or RNA-seq, is a method for mapping and quantifying the
total amount of RNA transcripts in a cell at any given time, otherwise known as
the transcriptome, for any organism that has a genomic DNA sequence
assembly. Compared to microarrays that detect and quantify transcripts by
hybridization against known sequences, RNA-seq directly sequences
transcripts and is especially
well-suited for de novo
discovery of RNA splicing patterns and for determining unequivocally
the presence or absence of lower abundance class RNAs.
RNA-seq is performed by reverse-transcribing an RNA sample into
cDNA followed by high throughput DNA sequencing. Most data is produced
in the format of either single reads or paired-end reads.
In the format of single reads each sequence read comes from one end
of a randomly primed cDNA molecule (and represent one end of one cDNA
segment), while paired-end reads are obtained as pairs
from both ends of a randomly primed cDNA (and represent two opposite
ends of one cDNA segment). The resulting sequence reads are then
informatically mapped onto the genome sequence (Alignments).
The current mappers (TopHat and STAR) have the ability to map
reads to annotated and unannotated genomic regions.
Reads mapped to annotated or novel RNA splice junctions are
Sites). Earlier versions of this software did not map
reads to unannotated genomic regions.
Some RNA-seq protocols do not specify the coding strand. As a result,
there can be ambiguity at loci where both strands are transcribed.
These tracks are multi-view composite tracks that contain multiple
data types (views). Each view within a track
has separate display controls, as described here.
Most ENCODE tracks contain multiple subtracks, corresponding to
multiple experimental conditions. If a track contains a large
number of subtracks, only some subtracks will be displayed by default.
The user can select which subtracks are displayed via the display controls
on the track details pages.
These data were generated and analyzed as part of the ENCODE project, a
genome-wide consortium project with the aim of cataloging all
functional elements in the human genome. This effort includes
collecting a variety of data across related experimental conditions to
facilitate integrative analysis. Consequently, additional ENCODE tracks may
contain data that is relevant to the data in these tracks.
Morozova O, Hirst M, Marra MA. Applications of new sequencing
technologies for transcriptome analysis. Annual Review of
Genomics and Human Genetics. 2009;10:135-51.
Metzker ML. Sequencing
technologies - the next generation. Nature Reviews: Genetics. 2010
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset
until nine months following the release of the dataset. This date is
listed in the Restricted Until column on the track configuration page
and the download page. The full data release policy for ENCODE is