Schema for Vega Genes - Vega Annotations
|
|
Database: hg19 Primary Table: vegaPseudoGene Row Count: 12,012   Data last updated: 2010-08-25
Format description: A gene prediction with some additional info. On download server: MariaDB table dump directory
field | example | SQL type | info | description |
bin | 585 | smallint(5) unsigned | range | Indexing field to speed chromosome range queries. |
name | OTTHUMT00000362751 | varchar(255) | values | Name of gene (usually transcript_id from GTF) |
chrom | chr1 | varchar(255) | values | Reference sequence chromosome or scaffold |
strand | + | char(1) | values | + or - for strand |
txStart | 11868 | int(10) unsigned | range | Transcription start position (or end position for minus strand item) |
txEnd | 14409 | int(10) unsigned | range | Transcription end position (or start position for minus strand item) |
cdsStart | 14409 | int(10) unsigned | range | Coding region start (or end position for minus strand item) |
cdsEnd | 14409 | int(10) unsigned | range | Coding region end (or start position for minus strand item) |
exonCount | 3 | int(10) unsigned | range | Number of exons |
exonStarts | 11868,12612,13220, | longblob | | Exon start positions (or end positions for minus strand item) |
exonEnds | 12227,12721,14409, | longblob | | Exon end positions (or start positions for minus strand item) |
score | 0 | int(11) | range | score |
name2 | RP11-34P13.1 | varchar(255) | values | Alternate name (e.g. gene_id from GTF) |
cdsStartStat | none | enum('none', 'unk', 'incmpl', 'cmpl') | values | Status of CDS start annotation (none, unknown, incomplete, or complete) |
cdsEndStat | none | enum('none', 'unk', 'incmpl', 'cmpl') | values | Status of CDS end annotation (none, unknown, incomplete, or complete) |
exonFrames | -1,-1,-1, | longblob | | Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region. |
|
| |
|
|
Connected Tables and Joining Fields
|
|
hg19.vegaGene.name (via vegaPseudoGene.name)
hg19.vegaGtp.transcript (via vegaPseudoGene.name)
hg19.vegaPep.name (via vegaPseudoGene.name)
| |
|
|
Sample Rows
|
|
bin | name | chrom | strand | txStart | txEnd | cdsStart | cdsEnd | exonCount | exonStarts | exonEnds | score | name2 | cdsStartStat | cdsEndStat | exonFrames |
---|
585 | OTTHUMT00000362751 | chr1 | + | 11868 | 14409 | 14409 | 14409 | 3 | 11868,12612,13220, | 12227,12721,14409, | 0 | RP11-34P13.1 | none | none | -1,-1,-1, |
585 | OTTHUMT00000002844 | chr1 | + | 12009 | 13670 | 13670 | 13670 | 6 | 12009,12178,12612,12974,13220,13452, | 12057,12227,12697,13052,13374,13670, | 0 | RP11-34P13.1 | none | none | -1,-1,-1,-1,-1,-1, |
585 | OTTHUMT00000002839 | chr1 | - | 14403 | 29570 | 29570 | 29570 | 11 | 14403,15004,15795,16606,16857,17232,17605,17914,18267,24737,29533, | 14501,15038,15947,16765,17055,17368,17742,18061,18366,24891,29570, | 0 | RP11-34P13.2 | none | none | -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, |
585 | OTTHUMT00000003224 | chr1 | + | 62947 | 63887 | 63887 | 63887 | 1 | 62947, | 63887, | 0 | OR4G11P | none | none | -1, |
586 | OTTHUMT00000003691 | chr1 | + | 131103 | 133923 | 133923 | 133923 | 1 | 131103, | 133923, | 0 | RP11-34P13.10 | none | none | -1, |
586 | OTTHUMT00000007034 | chr1 | - | 135246 | 138039 | 138039 | 138039 | 2 | 135246,137568, | 136006,138039, | 0 | RP11-34P13.11 | none | none | -1,-1, |
586 | OTTHUMT00000007241 | chr1 | - | 228318 | 228775 | 228775 | 228775 | 1 | 228318, | 228775, | 0 | AP006222.1 | none | none | -1, |
587 | OTTHUMT00000008000 | chr1 | + | 326095 | 328112 | 328112 | 328112 | 2 | 326095,327347, | 326569,328112, | 0 | RP4-669L17.8 | none | none | -1,-1, |
587 | OTTHUMT00000127609 | chr1 | - | 329425 | 332243 | 332243 | 332243 | 1 | 329425, | 332243, | 0 | RP4-669L17.9 | none | none | -1, |
587 | OTTHUMT00000007996 | chr1 | + | 329783 | 334271 | 334271 | 334271 | 2 | 329783,334128, | 329976,334271, | 0 | RP4-669L17.4 | none | none | -1,-1, |
|
Note: all start coordinates in our database are 0-based, not
1-based. See explanation
here.
| |
|
|
Vega Genes (vegaGeneComposite) Track Description
|
|
Description and Methods
This track shows gene annotations from the Vertebrate Genome Annotation (Vega)
database. Annotations are divided into two subtracks from the
Vega Human Genome Annotation project:
- Vega Protein-Coding and Non-Coding Gene Annotations
- Vega Annotated Pseudogenes and Immunoglobulin Segments
The following information is an excerpt from the
Vertebrate Genome Annotation home page:
"The Vega database
is designed to be a central repository for high-quality, frequently updated
manual annotation of different vertebrate finished genome sequence.
Vega attempts to present consistent high-quality curation of the published
chromosome sequences. Finished genomic sequence is analysed on a
clone-by-clone basis using
a combination of similarity searches against DNA and protein databases
as well as a series of ab initio gene predictions (GENSCAN, Fgenes).
The annotation is based on supporting evidence only."
"In addition, comparative analysis using vertebrate datasets such as
the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores
(Evolutionary Conserved Regions) are used for novel gene discovery."
Display Conventions and Configuration
This track follows the display conventions for
gene prediction
tracks. Transcript
type (and other details) may be found by clicking on the transcript
identifier which forms the outside link to the Vega transcript details page.
Further information on the gene and transcript classification may be found
here.
Credits
Thanks to Steve Trevanion at the
Wellcome Trust Sanger Institute
for providing the GTF and FASTA files for the Vega annotations. Vega
acknowledgements and publications are listed
here.
| |
|
|
|