PSU RNA-seq Track Settings
RNA-seq from ENCODE/PSU   (All Expression and Regulation tracks)

Maximum display mode:       Reset to defaults   
Select views (Help):
Signal ▾       Minus Raw Signal ▾       Plus Raw Signal ▾       Alignments ▾      
Select subtracks by cell line and replicate:
 All Cell Line CH12  Erythroblast  FV-progenitor  FVL-stem  G1E  G1E-ER4  G1E-ER4 (Time Course)  Megakaryocyte  MEL  MEP Cell LineAll 
Replicate Replicate
Select subtracks further by: (select multiple categories and items - help)

List subtracks: only selected/visible    all    ()
  Cell Line↓1 Treatment↓2 ReadType↓3 Replicate↓4 views↑5   Track Name↓6    Restricted Until↓7
 CH12      1x41  1  Signal  CH12 1x41 RNA-seq Signal Rep 1 from ENCODE/PSU    Schema   2012-12-28 
 Erythroblast      2x99D  1  Plus Raw Signal  Erythroblast 2x99D RNA-seq Plus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-20 
 Erythroblast      2x99D  1  Minus Raw Signal  Erythroblast 2x99D RNA-seq Minus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-20 
 G1E      2x99D  1  Plus Raw Signal  G1E 2x99D RNA-seq Plus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-22 
 G1E      2x99D  1  Minus Raw Signal  G1E 2x99D RNA-seq Minus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-22 
 Megakaryocyte      2x99D  1  Plus Raw Signal  Megakaryocyte 2x99D RNA-seq Plus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-28 
 Megakaryocyte      2x99D  1  Minus Raw Signal  Megakaryocyte 2x99D RNA-seq Minus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-28 
 MEL      1x45  1  Signal  MEL 1x45 RNA-seq Signal Rep 1 from ENCODE/PSU    Schema   2013-04-26 
 MEL  DMSO 2.0pct  1x45  1  Signal  MEL DMSO 2.0pct 1x45 RNA-seq Signal Rep 1 from ENCODE/PSU    Schema   2013-04-26 
 MEP      2x99D  1  Plus Raw Signal  MEP 2x99D RNA-seq Plus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-22 
 MEP      2x99D  1  Minus Raw Signal  MEP 2x99D RNA-seq Minus Raw Signal Rep 1 from ENCODE/PSU    Schema   2013-04-22 
     Restriction Policy


Rationale for the Mouse ENCODE project
Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function.

The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA with a function preserved in mammals versus that with a function in only one species will be discovered.

Transcriptome Maps
One of the epigenetic features most closely related to genomic activity is the production of stable RNA, including transcripts from both protein-coding genes and noncoding transcripts. These genomic compilations of transcripts, or transcriptomes, are primary determinants of the way cells function, respond and differentiate, both by the production of proteins translated from coding transcripts and the regulatory activity of untranslated non-coding transcripts. Non-coding RNA's regulate gene expression through diverse mechanisms ranging from reducing chromatin accessibility (affecting large regions or whole chromosomes) to precise fine-tuning of transcription from specific genes, e.g. via RNAi.

Even though a large proportion of mammalian genomes is transcribed, many of the transcribed segments have yet to be assigned any function. The ENCODE project aims to create a comprehensive, quantitative annotation of the human transcriptome in several cell and tissue types as well as to understand regulation of transcriptomes by establishing the relationship between regulatory factors and their targets. Mapping the mouse transcriptome in similar tissues will allow us to discern conservation of transcriptome profiles between mouse and human and to discover species-specific transcription patterns, and to infer conserved versus species-specific regulatory mechanisms. The results will have a significant impact on our understanding of the evolution of gene regulation.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. This track contains the following views:

Raw Signals
The Plus Raw Signal and Minus Raw Signal views show the density of mapped reads on the plus and minus strands (wiggle format), respectively.
Density graph (wiggle) of signal enrichment based on processed data.
Mappings of short reads to the genome.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.


Cells were grown according to the approved ENCODE cell culture protocols.

Total RNA was extracted from 5-10 million cells using TRIzol reagent. This was followed by mRNA selection, fragmentation and cDNA synthesis, which were performed as described previously (Mortazavi et al., 2009). Double-stranded cDNA samples were processed for library construction for Illumina sequencing, using the Illumina ChIP-seq Sample Preparation Kit.

Strand-specific libraries were generated in a similar manner, except for a couple of modifications described previously (Parkhomchuk et al., 2009). Briefly, instead of dTTP, dUTP was used during second-strand cDNA synthesis to label the second-strand cDNA. During library preparation, the dUTP-labeled cDNA was treated with Uracil N Glycosylase, prior to the PCR amplification step. This was done to remove uracil from the second-strand, following which the DNA was subjected to high heat to facilitate abasic scission of the second strand.

Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are considered as biological replicates.

Sequencing was done on the Illumina Genome Analyzer IIx and on the Illumina HiSeq 2000. FastQ files for the resulting sequence reads (single read and paired-end, directional and non-directional) were moved to a data library in Galaxy, and tools implemented in Galaxy were used for further processing via workflows ((Giardine et al., 2005), (Blankenberg et al., 2010 ), (Goecks et al., 2010)). Data processing was also performed on the CyberSTAR high-performance computing system at Penn State. The reads were mapped to the mouse genome (mm9 assembly) using the program TopHat ((Langmead et al., 2009) and (Trapnell et al., 2009)). Signal tracks were created using BEDtools (Quinlan et al., 2010) and SAMtools (Li, Handasaker et al., 2009).


Cell growth and RNA isolation were done in the laboratories of Ross Hardison, Robert Paulson, David Bodine and Mitchell J. Weiss (PSU, NHGRI and Children's Hospital of Philadelphia). Isolation of mRNA, cDNA synthesis and Illumina library construction were done primarily by Tejaswini Mishra, and sequencing on the Illumina was done largely by Cheryl Keller, both in the laboratory of Ross Hardison. Mapping and transcript assembly were done by Belinda Giardine and Tejaswini Mishra on Galaxy and the CyberSTAR, Penn State high-performance computing system. Data processing and analysis were overseen by James Taylor (Emory University) and Ross Hardison (PSU). Generation of these data was supported by National Institutes of Health grants R01DK065806 and RC2HG005573. This work was supported in part through instrumentation funded by the National Science Foundation through grant OCI-0821527.

Contact: Ross Hardison


Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. 2010 Jan;Chapter 19:Unit 19.10.1-21.

Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005 Oct;15(10):1451-5.

Goecks J, Nekrutenko A, Taylor J, Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9.

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 Jul;5(7):621-8.

Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009 Oct;37(18):e123.

Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2.

Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009 May 1;25(9):1105-11.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.