Schema for SIB Genes - Swiss Institute of Bioinformatics Gene Predictions from mRNA and ESTs
  Database: hg38    Primary Table: sibGene    Row Count: 208,508   Data last updated: 2014-08-27
Format description: A gene prediction.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
name HTR067749.1.1.1varchar(255) values Name of gene
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand +char(1) values + or - for strand
txStart 11873int(10) unsigned range Transcription start position (or end position for minus strand item)
txEnd 14409int(10) unsigned range Transcription end position (or start position for minus strand item)
cdsStart 12189int(10) unsigned range Coding region start (or end position for minus strand item)
cdsEnd 13259int(10) unsigned range Coding region end (or start position for minus strand item)
exonCount 3int(10) unsigned range Number of exons
exonStarts 11873,12645,13220,longblob   Exon start positions (or end positions for minus strand item)
exonEnds 12227,12697,14409,longblob   Exon end positions (or start positions for minus strand item)

Sample Rows
 
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEnds
585HTR067749.1.1.1chr1+11873144091218913259311873,12645,13220,12227,12697,14409,
585HTR067749.1.1.0chr1+11873144091218913259311873,12612,13220,12227,12721,14409,
585HTR067740.1.0.22chr1-13419300541674429380613419,16857,17232,17605,17914,29320,16765,17055,17368,17742,18061,30054,
585HTR067740.1.0.21chr1-13419300541674418378813419,16857,17232,17605,17685,17914,18267,29320,16765,17055,17368,17631,17742,18061,24891,30054,
585HTR067740.1.0.20chr1-13419300541674429380713419,16857,17232,17605,17914,18267,29320,16765,17055,17368,17742,18061,18366,30054,
585HTR067740.1.0.19chr1-134193005416744247431113419,14969,15795,16606,16857,17232,17605,17914,18267,24737,29320,14829,15038,15947,16765,17055,17368,17742,18061,18366,24891,30054,
585HTR067740.1.0.18chr1-13419300541674417361513419,16875,17232,17914,29320,16765,17055,17368,18379,30054,
585HTR067740.1.0.17chr1-13419300541674424886913419,16857,17232,17605,17914,18267,18496,24737,29320,16765,17055,17368,17742,18061,18366,18554,24891,30054,
585HTR067740.1.0.16chr1-134193005417547183631213419,14969,15795,16606,16857,17232,17914,18267,18496,18912,24737,29320,14829,15038,15947,16765,17055,17742,18061,18366,18554,20286,24891,30054,
585HTR067740.1.0.14chr1-13419300541789319020713419,16857,17605,18267,18912,24737,29320,16765,17055,18061,18366,19139,25008,30054,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

SIB Genes (sibGene) Track Description
 

Description

The SIB Genes track is a transcript-based set of gene predictions based on data from RefSeq and EMBL/GenBank. Genes all have the support of at least one GenBank full length RNA sequence, one RefSeq RNA, or one spliced EST. The track includes both protein-coding and non-coding transcripts. The coding regions are predicted using ESTScan.

Display Conventions and Configuration

This track in general follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks while those for coding open reading frames are thicker.

This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature.

Further information on the predicted transcripts can be found on the Transcriptome Web interface.

Methods

The SIB Genes are built using a multi-step pipeline:

  1. RefSeq and GenBank RNAs and ESTs are aligned to the genome with SIBsim4, keeping only the best alignments for each RNA.
  2. Alignments are broken up at non-intronic gaps, with small isolated fragments thrown out.
  3. A splicing graph is created for each set of overlapping alignments. This graph has an edge for each exon or intron, and a vertex for each splice site, start, and end. Each RNA that contributes to an edge is kept as evidence for that edge.
  4. The graph is traversed to generate all unique transcripts. The traversal is guided by the initial RNAs to avoid a combinatorial explosion in alternative splicing.
  5. Protein predictions are generated.

Credits

The SIB Genes track was produced on the Vital-IT high-performance computing platform using a computational pipeline developed by Christian Iseli with help from colleagues at the Ludwig Institute for Cancer Research and the Swiss Institute of Bioinformatics. It is based on data from NCBI RefSeq and GenBank/EMBL. Our thanks to the people running these databases and to the scientists worldwide who have made contributions to them.

References

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779