Known+Pred RNA Track Settings
 
Known and Predicted RNA Transcription in the ENCODE Regions   (All Pilot ENCODE Regions and Genes tracks)

Display mode:      Duplicate track

Show only items with score at or above:   (range: 0 to 1000)

Data schema/format description and download
Source data version: ENCODE June 2005 Freeze
Assembly: Human Mar. 2006 (NCBI36/hg18)
Data coordinates converted via liftOver from: July 2003 (NCBI34/hg16)
Data last updated at UCSC: 2007-06-14

Description

This track shows the locations of known and predicted non-protein-coding RNA genes and pseudogenes that fall within the ENCODE regions. It contains all information in Sean Eddy's RNA Genes track for these regions, combined with computational predictions generated by Jakob Skou Pedersen's EvoFold algorithm. In addition to the fields contained in the RNA Genes track, this track also includes ENCODE-related fields describing overlap with transcribed regions and repeats.

Feature types in this annotation include:

  • tRNA: transfer RNA (or pseudogene)
  • rRNA: ribosomal RNA (or pseudogene)
  • scRNA: small cytoplasmic RNA (or pseudogene)
  • snRNA: small nuclear RNA (or pseudogene)
  • snoRNA: small nucleolar RNA (or pseudogene)
  • miRNA: microRNA (or pseudogene)
  • misc_RNA: miscellaneous other RNA, such as Xist (or pseudogene)
  • "-": unknown RNA

Display Conventions and Configuration

The locations of the RNA genes and pseudogenes are represented by blocks in the graphical display, color-coded as follows:

  • Black: region is Repeatmasked.
  • Green: region is transcribed.
  • Red: region is from the RNA Genes track and is not transcribed.
  • Blue: region is an EvoFold prediction and is not transcribed.

The display may be filtered to show only those items with unnormalized scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page.

Methods

The RNA Genes track was supplemented with EvoFold predictions and filtered to include only those items that lie within the ENCODE regions. Regions that are at least 10 percent Repeatmasked are flagged because no transcriptional data is available for them. A region is considered transcribed if at least 10 percent overlaps with any Affymetrix transcribed fragment (transfrag), derived from six microarray experiments, or Yale transcriptionally-active region (TAR), derived from 15 microarray experiments. In these cases, each array from which the overlapped transfrags and TARs were derived is listed.

EvoFold is a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures. The method makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist both of a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions.

Two sets of EvoFold predictions are included in this track. The first, labeled EvoFold, contains predictions based on the conserved elements of an 8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebrafish, and Fugu assemblies. The second set of predictions, TBA23_EvoFold, was based on the conserved elements of the 23-way TBA alignments present in the ENCODE regions. When a pair of these predictions overlap, only the EvoFold prediction is shown.

Credits

These data were kindly provided by Sean Eddy at Washington University, Jakob Skou Pedersen at UC Santa Cruz, and The Encode Consortium.

This annotation track was generated by Matt Weirauch.

References

Knudsen, B. and J.J. Hein. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6), 446-54 (1999).

Pedersen, J.S., Bejerano, G. and Haussler, D. Identification and classification of conserved RNA secondary structures in the human genome. (In preparation).