Description
The ExonWalk program merges cDNA evidence together to predict full
length isoforms, including alternative transcripts. To predict
transcripts that are biologically functional, rather than the result
of technical or biological noise, ExonWalk requires that every intron
and exon be either: 1) Present in cDNA libraries of another organism
(i.e. also present in mouse), 2) Have three separate cDNA GenBank
entries supporting it, or 3) Be evolving like a coding exon as
determined by Exoniphy.
Once the transcripts are predicted a orf finder is used (BESTORF from
Softberry) is used to find the
best open reading frame. By default transcripts that are targets for
nonsense mediated decay (NMD) are filtered out as they are less likely
to be translated into proteins.
Methods
The input to the ExonWalk program is the AltSplice track which has
filtered out exons and introns that are not: 1) Present in cDNA
libraries of another organism (i.e. also present in mouse), 2) Have
three separate cDNA GenBank entries supporting it, or 3) Be evolving
like a coding exon as determined by Exoniphy.
The ExonWalk algorithm takes these filtered sequences and
constructs a graph where the exons are the nodes and the introns are
the edges. The goal of the program is to produce all full length
transcripts implied by the transcripts. Full length transcripts are
defined as transcripts that are not a subsequences of another
transcript. The stages of the algorithm can be divided into three
steps as illustrated in Figure 1 below:
- Detection and connection of compatible transcripts (Figure 1B).
- Merging of vertices that are identical in terms of splicing (Figure 1C).
- Exploration of all paths in the resulting graph (Figure 1D).
|
Different stages of the ExonWalk Program. A. Different
transcripts for a particular gene have been aligned to the genome to
give an order and orientation. B. Exons in the overlapping
section of compatible transcripts are joined to form new
edges. C. Vertices which are redundant are pruned from the
graph, being replaced by edges from other, equivalent, vertices. This
simplifies the initial graph and yet retains splicing specific
information. D. The maximal paths through the graph are
explored to produce a set of maximal (full length) transcripts.
|
Initially each each transcript is an independent sub-graph in the
exon graph. Individual transcripts are then compared pairwise to
determine if they are compatible. If they are compatible, an edge is
created between exons of the overlap, called a compatibility edge.
This results in a directed graph where overlapping exons are connected
together, and thus compatible transcripts have been connected as well
(Figure 1B). The algorithm then makes use of the
implicit order provided by the genome sequence and the fact that
splicing occurs in order to explore all of the paths present in the
graph.
Comments/Questions? Email sugnet@soe.ucsc.edu
|