Gencode Genes Track Settings
 
ENCODE Gencode Gene Annotations   (All Genes and Gene Predictions tracks)

Display mode:       Reset to defaults

Label: gene accession both none

Color track by codons: Help on codon coloring

Show codon numbering:
Filter items by: (select multiple categories and items - help)

Level Class Transcript Type


Display data as a density graph:
List subtracks: only selected/visible    all  
 
hide
 Configure
 Gencode Manual  ENCODE Gencode Manual Gene Annotations (level 1+2) (Oct 2009)    Data format 
 
hide
 Configure
 Gencode Auto  ENCODE Gencode Automated Gene Annotations (level 3) (Oct 2009)    Data format 
 
hide
 Configure
 Gencode PolyA  ENCODE Gencode PolyA Transcript Annotations (level 2) (Oct 2009)    Data format 
Assembly: Human Mar. 2006 (NCBI36/hg18)

Release Notes

This release of the Gencode Genes track (Version 3c, October 2009) shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project.

Version 3 of the Gencode gene set presents a full merge between HAVANA and ENSEMBL, giving priority to the manually curated Havana objects and using ENSEMBL objects where they are different or fall into un-annotated regions. The annotation was carried out on genome assembly GRCh37 (hg19), features are projected back to NCBI36 (hg18) where possible. Gencode 3c is a small update of version 3b (July 09 freeze) mainly for chromosomes 3 & 4 for which the latest annotation was held back and QC'ed again to be used in the RNASeq Genome Annotation Assessment Project. Statistics about this release can be found here.

Display Conventions and Configuration

The annotations are divided into separate tracks based on source/confidence. The Gencode project recommends that the annotations from level 1 & 2 be used as the reference gene annotation, level 3 was added to fill gaps for methods that analyze the entire genome and require a full set.

  • Level 1: validated
At this time only pseudogene loci, that were predicted by the analysis-pipelines from YALE, UCSC as well as by HAVANA manual annotation from WTSI.
  • Level 2: manual annotation
HAVANA manual annotation from WTSI.
The following regions are considered "fully annotated" and contain level 2 annotation from HAVANA only, although they will still be updated: chromosomes 1, 2, 6, 9, 10, 13, 20, 21, 22, X, Y, ENCODE pilot regions, chr11:2353995-3878750.
  • Level 3: automated annotation
ENSEMBL loci in regions where no HAVANA annotation can be found.

NOTE: The release cycles for Gencode, Havana and Ensembl differ. Users are cautioned to compare release dates to determine which annotation is most current.

The gene annotations are colored based on the HAVANA annotation type and the confidence level. See the table below for the color key, as well as more detail about the transcript and feature types.

Class Color Description Transcript Types (see Vega Transcript Types)
Validated_coding Dark Orange Level 1 Validated:
coding regions
protein_coding
Validated_processed Light Orange Level 1 Validated:
processed
processed_transcript
Validated_processed_pseudogene Dark Pink Level 1 Validated:
processed pseudogenes
processed_pseudogene, processed_transcript, transcribed_processed_pseudogene
Validated_unprocessed_pseudogeneMedium Pink Level 1 Validated:
unprocessed pseudogenes
transcribed_unprocessed_pseudogene, unprocessed_pseudogene
Validated_pseudogene Light Pink Level 1 Validated:
pseudogenes
IG_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene
Havana_coding Dark Orange Level 2 Manual annotation:
coding
IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,protein_coding
Havana_nonsense Medium OrangeLevel 2 Manual annotation:
nonsense
nonsense_mediated_decay
Havana_non_coding Light Orange Level 2 Manual annotation:
non-coding
ambiguous_orf, antisense, non_coding, processed_transcript, retained_intron
Havana_polyA Black Level 2 Manual annotation:
polyA
polyA_signal, polyA_site, pseudo_polyA
Havana_processed_pseudogene Dark Pink Level 2 Manual annotation:
processed pseudogene
processed_pseudogene, transcribed_processed_pseudogene
Havana_unprocessed_pseudogene Medium Pink Level 2 Manual annotation:
unprocessed pseudogene
transcribed_unprocessed_pseudogene, unprocessed_pseudogene
Havana_pseudogene Light Pink Level 2 Manual annotation:
pseudogene
IG_pseudogene, TR_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene
Havana_TEC Grey Level 2 Manual annotation:
TEC
TEC, artifact
Ensembl_coding Dark Red Level 3 Automated annotation:
coding
IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene, protein_coding
Ensembl_non_coding Light Orange Level 3 Automated annotation:
non-coding
antisense, non_coding, processed_transcript, retained_intron
Ensembl_pseudogene Dark Pink Level 3 Automated annotation:
pseudogene
IG_pseudogene, miRNA_pseudogene, misc_RNA_pseudogene, pseudogene, retrotransposed, unitary_pseudogene
Ensembl_processed_pseudogene Medium Pink Level 3 Automated annotation:
processed pseudogene
processed_pseudogene
Ensembl_unprocessed_pseudogene Light Pink Level 3 Automated annotation:
unprocessed pseudogene
unprocessed_pseudogene
Ensembl_RNA Light Red Level 3 Automated annotation:
RNA transcripts
Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, miRNA, misc_RNA, rRNA, rRNA_pseudogene, scRNA_pseudogene, snRNA, snRNA_pseudogene, snoRNA, snoRNA_pseudogene, tRNA_pseudogene, tRNAscan
2way_consensus_pseudogene Dark Purple Level 3 Automated annotation:
pseudogenes
pseudogenes

This track uses filtering by category to select subsets of transcripts and has additional advanced features. Help with these features can be found here.

Methods

We aim to annotate all evidence-based gene features at high accuracy on the human reference sequence. This includes identifying all protein-coding loci with associated alternative variants, non-coding loci which have transcript evidence, and pseudogenes. We integrate computational approaches (including comparative methods), manual annotation and targeted experimental verification.

For a detailed description of the methods and references used, see Harrow et al (2006).

Verification

See Harrow et al. (2006) for information on verification techniques.

Credits

This GENCODE release is the result of a collaborative effort among the following laboratories: (contact: Felix Kokocinski)

Lab/Institution
Contributors
HAVANA annotation group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK Adam Frankish, James Gilbert, Jennifer Harrow, Felix Kokocinski, Stephen Trevanion, Tim Hubbard (GENCODE Principal Investigator)
Genome Bioinformatics Lab (CRG), Barcelona, Spain Thomas Derrien, Tyler Alioto, Roderic Guigó
Genome Bioinformatics, University of California Santa Cruz (UCSC), USA Rachel Harte, Mark Diekhans, Robert Baertsch, David Haussler
Comp. Genomics Lab, Washington University St. Louis (WUSTL), USA Jeltje van Baren, Charlie Comstock, David Lu, Michael Brent
Computer Science and Artificial Intelligence Lab, Broad Institute of MIT and Harvard, USA Mike Lin, Manolis Kellis
Bioinformatics, Yale University (Yale), USA Philip Cayting, Mark Gerstein
Center for Integrative Genomics, University of Lausanne, Switzerland Cedric Howald, Alexandre Reymond
ENSEMBL genebuild group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK Bronwen Aken, Julio Fernandez Banet, Stephen Searle
Structural Computational Biology Group, Centro Natcional de Investigaciones Oncologicas (CNIO), Madrid, Spain Manuel Rodríguez José, Jan-Jaap Wesselink, Michael Tress, Alfonso Valencia

References

Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, et al. The GENCODE exome: sequencing the complete human exome. European Journal of Human Genetics. March 2011;19 827-831. [Epub ahead of print]

Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9.

Data Release Policy

GENCODE data are available for use without restrictions. The full data release policy for ENCODE is available here.