Description
This track shows gene predictions using the N-SCAN gene structure prediction
software provided by the Computational Genomics Lab at Washington University
in St. Louis, MO, USA.
Methods
N-SCAN combines biological-signal modeling in the target genome sequence along
with information from a multiple-genome alignment to generate de novo gene
predictions. It extends the TWINSCAN target-informant genome pair to allow for
an arbitrary number of informant sequences as well as richer models of
sequence evolution. N-SCAN models the phylogenetic relationships between the
aligned genome sequences, context-dependent substitution rates, insertions,
and deletions.
The PASA clusters were used as 'EST' sequences in N-SCAN PASA-EST. In
addition, the xenoRefSeq track was downloaded and split in unique exon
pairs. All exon pairs that had valid splice sites were then added to the EST
track. The resulting gene models were updated with the input PASA clusters
using the assembly tool of the PASA pipeline. These updates consist of
automatically generated alternative splices, UTR features and occasional
merging of two gene models. In addition, PASA assigned open reading frames
to clusters that did not overlap a gene prediction, but that did contain a
full length cDNA, and output them as 'novel genes'. Note that PASA does not
use any cDNA annotation from input but assigns the ORF itself.
No manual annotation was performed to generate any of the gene models.
Important note: It is possible that real genes were merged by N-SCAN.
Therefore, when looking at this track it is advisable to also open the
'Other Refseq' track. Merged genes will often overlap two separate RefSeqs.
Zebra Finch N-SCAN uses chicken (galGal3) as the informant
Credits
Thanks to Michael Brent's Computational Genomics Group at Washington
University St. Louis for providing this data.
Special thanks for this implementation of N-SCAN to Aaron Tenney in
the Brent lab, and Robert Zimmermann, currently at Max F. Perutz
Laboratories in Vienna, Austria.
References
Gross SS, Brent MR.
Using multiple alignments to improve gene prediction.
J Comput Biol. 2006 Mar;13(2):379-93.
PMID: 16597247
Korf I, Flicek P, Duan D, Brent MR.
Integrating genomic homology into gene structure prediction.
Bioinformatics. 2001;17 Suppl 1:S140-8.
PMID: 11473003
van Baren MJ, Brent MR.
Iterative gene prediction and pseudogene removal improves genome annotation.
Genome Res. 2006 May;16(5):678-85.
PMID: 16651666; PMC: PMC1457044
|