Schema for CCDS - Consensus CDS
  Database: hg38    Primary Table: ccdsGene    Row Count: 32,506   Data last updated: 2019-10-03
Format description: A gene prediction with some additional info.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
name CCDS30547.1varchar(255) values Name of gene (usually transcript_id from GTF)
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand +char(1) values + or - for strand
txStart 69090int(10) unsigned range Transcription start position (or end position for minus strand item)
txEnd 70008int(10) unsigned range Transcription end position (or start position for minus strand item)
cdsStart 69090int(10) unsigned range Coding region start (or end position for minus strand item)
cdsEnd 70008int(10) unsigned range Coding region end (or start position for minus strand item)
exonCount 1int(10) unsigned range Number of exons
exonStarts 69090,longblob   Exon start positions (or end positions for minus strand item)
exonEnds 70008,longblob   Exon end positions (or start positions for minus strand item)
score 0int(11) range score
name2  varchar(255) values Alternate name (e.g. gene_id from GTF)
cdsStartStat cmplenum('none', 'unk', 'incmpl', 'cmpl') values Status of CDS start annotation (none, unknown, incomplete, or complete)
cdsEndStat cmplenum('none', 'unk', 'incmpl', 'cmpl') values Status of CDS end annotation (none, unknown, incomplete, or complete)
exonFrames 0,longblob   Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region.

Connected Tables and Joining Fields
        hg38.ccdsInfo.ccds (via ccdsGene.name)
      hg38.ccdsKgMap.ccdsId (via ccdsGene.name)
      hg38.ccdsNotes.ccds (via ccdsGene.name)

Sample Rows
 
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEndsscorename2cdsStartStatcdsEndStatexonFrames
585CCDS30547.1chr1+69090700086909070008169090,70008,0cmplcmpl0,
588CCDS72675.1chr1-4507394516784507394516781450739,451678,0cmplcmpl0,
590CCDS41221.1chr1-6857156866546857156866541685715,686654,0cmplcmpl0,
592CCDS2.2chr1+92594194415392594194415313925941,930154,931038,935771,939039,939274,941143,942135,942409,942558,943252,943697,943907,926013,930336,931089,935896,939129,939460,941306,942251,942488,943058,943377,943808,944153,0cmplcmpl0,0,2,2,1,1,1,2,1,2,1,0,0,
592CCDS3.1chr1-94469395924094469395924019944693,945056,945517,946172,946401,948130,948489,951126,951999,952411,953174,953781,954003,955922,956094,956893,957098,958928,95 ...944800,945146,945653,946286,946545,948232,948603,951238,952139,952600,953288,953892,954082,956013,956215,957025,957273,959081,95 ...0cmplcmpl1,1,0,0,0,0,0,2,0,0,0,0,2,1,0,0,2,2,0,
592CCDS30550.1chr1+96069396519196069396519112960693,961292,961628,961825,962354,962703,963108,963336,963919,964106,964348,964962,960800,961552,961750,962047,962471,962917,963253,963504,964008,964180,964530,965191,0cmplcmpl0,2,1,0,0,0,1,2,2,1,0,2,
592CCDS53256.1chr1+96653197457596653197457515966531,966703,970276,970520,970685,970878,971076,971323,972074,972287,972860,973499,973832,974315,974441,966614,966803,970423,970601,970758,971006,971208,971404,972150,972424,973010,973640,974051,974364,974575,0cmplcmpl0,2,0,0,0,1,0,0,0,1,0,0,0,0,1,
592CCDS4.1chr1+96653197457596653197457516966531,966703,970276,970520,970685,970878,971112,971323,972074,972287,972860,973185,973499,973832,974315,974441,966614,966803,970423,970601,970758,971006,971208,971404,972150,972424,973010,973326,973640,974051,974364,974575,0cmplcmpl0,2,0,0,0,1,0,0,0,1,0,0,0,0,0,1,
592CCDS76083.1chr1-9761719810299761719810293976171,976498,978880,976269,976624,981029,0cmplcmpl1,1,0,
592CCDS44034.1chr1-9990589999739990589999733999058,999525,999691,999432,999613,999973,0cmplcmpl1,0,0,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

CCDS (ccdsGene) Track Description
 

Description

This track shows human genome high-confidence gene annotations from the Consensus Coding Sequence (CCDS) project. This project is a collaborative effort to identify a core set of human protein-coding regions that are consistently annotated and of high quality. The long-term goal is to support convergence towards a standard set of gene annotations on the human genome.

Collaborators include:

For more information on the different gene tracks, see our Genes FAQ.

Methods

CDS annotations of the human genome were obtained from two sources: NCBI RefSeq and a union of the gene annotations from Ensembl and Vega, collectively known as Hinxton.

Genes with identical CDS genomic coordinates in both sets become CCDS candidates. The genes undergo a quality evaluation, which must be approved by all collaborators. The following criteria are currently used to assess each gene:

  • an initiating ATG (Exception: a non-ATG translation start codon is annotated if it has sufficient experimental support), a valid stop codon, and no in-frame stop codons (Exception: selenoproteins, which contain a TGA codon that is known to be translated to a selenocysteine instead of functioning as a stop codon)
  • ability to be translated from the genome reference sequence without frameshifts
  • recognizable splicing sites
  • no intersection with putative pseudogene predictions
  • supporting transcripts and protein homology
  • conservation evidence with other species

A unique CCDS ID is assigned to the CCDS, which links together all gene annotations with the same CDS. CCDS gene annotations are under continuous review, with periodic updates to this track.

Credits

This track was produced at UCSC from data downloaded from the CCDS project web site.

References

Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161

Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009 Jul;19(7):1316-23. PMID: 19498102; PMC: PMC2704439

Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979