Description
The GENCODE track is composed of all the gene models in the GENCODE v24 release.
It includes both protein-coding genes and
non-coding RNA genes.
Display Conventions and Configuration
This track in general follows the display conventions for
gene prediction tracks. The exons
for putative non-coding genes and untranslated regions are represented by relatively thin blocks,
while those for coding open reading frames are thicker. The following color key is used:
- Black -- feature has a corresponding entry in the Protein
Data Bank (PDB)
- Dark blue -- transcript has been reviewed
or validated by either the RefSeq or SwissProt staff
- Medium blue -- other RefSeq transcripts
- Light blue -- non-RefSeq transcripts
This track contains an optional codon coloring
feature that allows users to quickly validate and compare gene predictions.
Methods
All the GENCODE genes in the comprehensive set are downloaded from the GENCODE
website.
Data from other sources are correlated with the GENCODE data to build the knownTo tables.
Related Data
The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a
downloadable file. These
include tables that link GENCODE Genes transcripts to external datasets (such as
knownToLocusLink, which maps GENCODE Genes transcripts to Entrez identifiers, previously known
as Locus Link identifiers), and tables that detail some property of GENCODE Genes transcript sequences
(such as knownToPfam, which identifies any Pfam domains found in the GENCODE Genes
protein-coding transcripts). One can see a full list of the associated tables in the
Table Browser by selecting GENCODE Genes from the track menu;
this list is then available on the table menu. Note that some of these tables refer to GENCODE
Genes by its former name of Known Genes, sometimes abbreviated as known or kg.
While the complete set of annotation tables is too long to describe, some of the more important
tables are described below.
- kgXref identifies the RefSeq, SwissProt, Rfam, or tRNA sequences (if any) which are
associated with each transcript.
- knownToRefSeq identifies the RefSeq transcript that each GENCODE Genes transcript is most
closely associated with. That RefSeq transcript is the RefSeq transcript that the GENCODE Genes transcript
overlaps at the most bases.
- knownGeneMrna contains the genomic sequence for each of the GENCODE Genes models.
This may not be the same as the actual mRNA used to validate the gene model.
- knownGenePep contains the protein sequences derived from the knownGeneMrna transcript
sequences. Any protein-level annotations, such as the contents of the knownToPfam table, are based
on these sequences.
- knownIsoforms maps each transcript to a cluster ID, a cluster of isoforms of
the same gene.
- knownCanonical identifies the canonical isoform of each cluster ID or gene using the
ENSEMBL gene IDs to define each cluster. The canonical transcript is chosen using the APPRIS
principal transcript when available.
If no APPRIS tag exists for any transcript associated with the cluster, then a transcript in the
BASIC set is chosen. If no BASIC transcript exists, then the longest isoform is used.
Credits
The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive
gene set using a computational pipeline developed by Jim Kent and Brian Raney.
References
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa
A, Searle S et al.
GENCODE: the reference human genome annotation for The ENCODE Project.
Genome Res. 2012 Sep;22(9):1760-74.
PMID: 22955987; PMC: PMC3431492
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R,
Swarbreck D et al.
GENCODE: producing a reference annotation for ENCODE.
Genome Biol. 2006;7 Suppl 1:S4.1-9.
PMID: 16925838; PMC: PMC1810553
A full list of GENCODE publications are available
at The GENCODE Project
web site.
Data Release Policy
GENCODE data are available for use without restrictions.
|
  |