Schema for Poly(A) - Poly(A) Sites, Both Reported and Predicted
  Database: hg19    Primary Table: polyaPredict    Row Count: 52,169   Data last updated: 2011-01-05
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 14331int(10) unsigned range Start position in chromosome
chromEnd 14416int(10) unsigned range End position in chromosome
name NM_198943.polyA-1varchar(255) values Name of item
score 813int(10) unsigned range Optional score, nominal range 0-1000
strand -char(1) values + or -
thickStart 14361int(10) unsigned range Start of where display should be thick (start codon)
thickEnd 14362int(10) unsigned range End of where display should be thick (stop codon)

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEnd
585chr11433114416NM_198943.polyA-1813-1436114362
585chr11433114416NM_182905.polyA-1813-1436114362
585chr16987769910NM_001005484.polyA-1588+6990469905
585chr17019670300NM_001005484.polyA-2837+7021470215
585chr17031870376NM_001005484.polyA-3756+7035570356
587chr1368543368588NM_001005221.polyA-1634+368582368583
587chr1368543368588NM_001005224.polyA-1634+368582368583
587chr1368543368588NM_001005277.polyA-1634+368582368583
587chr1368776368814NM_001005221.polyA-2704+368803368804
587chr1368776368814NM_001005224.polyA-2704+368803368804

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Poly(A) (polyA) Track Description
 

Description

The polyA_DB database is a set of human mRNA polyadenlyation sites based on EST/cDNA evidence. A site is a single base denoting the beginning of a poly(A) tail in a nascent mRNA transcript and is typically 10-30 nucleotides downstream of a polyadenylation signal (most commonly AAUAAA). The polyA_DB web server is found at http://exon.umdnj.edu/polya_db/.

The Poly(A) composite track consists of two subtracks: a polyA_DB subtrack that displays reported poly(A) sites, and a poly(A) prediction subtrack that displays poly(A) sites predicted using a support vector machine (SVM).

The poly(A) predictions are made using 1500-base DNA sequences centered at the end of each RefSeq gene. The sequences serve as input into the SVM described in Cheng et al., 2006. The SVM scores each base using a model derived from 15 different cis-elements and reports an E-value for a region of DNA between 0 (excellent) and 0.5 (worst). This E-value is then normalized to an integer value between 0 (worst) and 1000 (excellent). High-scoring regions are highlighted, with the highest-scoring base indicated by a thicker line. The median length of these regions is 48 bases.

References

Cheng Y, Miura RM, Tian B. Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics. 2006 Oct 1;22(19):2320-5. PMID: 16870936

Zhang H, Hu J, Recce M, Tian B. PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D116-20. PMID: 15608159; PMC: PMC540009