Schema for Poly(A) - Poly(A) Sites, Both Reported and Predicted

Home
Genomes
Genome Browser
Tools
Mirrors
- Euro/Asia Mirrors
- Mirroring Instructions
- US Server
- European Server
- Asian Server
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Conditions of Use
- Our History
- Jobs
- Licenses
- Contact Us

field

example

SQL type

info

description

bin

585

smallint(5) unsigned

range

Indexing field to speed chromosome range queries.

chrom

chr1

varchar(255)

values

Reference sequence chromosome or scaffold

chromStart

14331

int(10) unsigned

range

Start position in chromosome

chromEnd

14416

int(10) unsigned

range

End position in chromosome

name

NM_198943.polyA-1

varchar(255)

values

Name of item

score

813

int(10) unsigned

range

Optional score, nominal range 0-1000

strand

char(1)

values

+ or -

thickStart

14361

int(10) unsigned

range

Start of where display should be thick (start codon)

thickEnd

14362

int(10) unsigned

range

End of where display should be thick (stop codon)

bin

chrom

chromStart

chromEnd

name

score

strand

thickStart

thickEnd

585

chr1

14331

14416

NM_198943.polyA-1

813

14361

14362

585

chr1

14331

14416

NM_182905.polyA-1

813

14361

14362

585

chr1

69877

69910

NM_001005484.polyA-1

588

69904

69905

585

chr1

70196

70300

NM_001005484.polyA-2

837

70214

70215

585

chr1

70318

70376

NM_001005484.polyA-3

756

70355

70356

587

chr1

368543

368588

NM_001005221.polyA-1

634

368582

368583

587

chr1

368543

368588

NM_001005224.polyA-1

634

368582

368583

587

chr1

368543

368588

NM_001005277.polyA-1

634

368582

368583

587

chr1

368776

368814

NM_001005221.polyA-2

704

368803

368804

587

chr1

368776

368814

NM_001005224.polyA-2

704

368803

368804

Description

The polyA_DB database is a set of human mRNA polyadenlyation sites based on EST/cDNA evidence. A site is a single base denoting the beginning of a poly(A) tail in a nascent mRNA transcript and is typically 10-30 nucleotides downstream of a polyadenylation signal (most commonly AAUAAA). The polyA_DB web server is found at http://exon.umdnj.edu/polya_db/.

The Poly(A) composite track consists of two subtracks: a polyA_DB subtrack that displays reported poly(A) sites, and a poly(A) prediction subtrack that displays poly(A) sites predicted using a support vector machine (SVM).

The poly(A) predictions are made using 1500-base DNA sequences centered at the end of each RefSeq gene. The sequences serve as input into the SVM described in Cheng et al., 2006. The SVM scores each base using a model derived from 15 different cis-elements and reports an E-value for a region of DNA between 0 (excellent) and 0.5 (worst). This E-value is then normalized to an integer value between 0 (worst) and 1000 (excellent). High-scoring regions are highlighted, with the highest-scoring base indicated by a thicker line. The median length of these regions is 48 bases.

References

Cheng Y, Miura RM, Tian B. Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics. 2006 Oct 1;22(19):2320-5. PMID: 16870936

Zhang H, Hu J, Recce M, Tian B. PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D116-20. PMID: 15608159; PMC: PMC540009