Description
This track displays regions of the human genome assembly (hg16) that are deleted
in the chimpanzee draft assembly (panTro1). Only regions of between
80 and 12000 bases are included. The name of each deletion is a unique pointer to
that deletion followed by an underscore and then its length. A similar track,
showing human deletions in the chimpanzee assembly, appears in the chimp
Genome Browser.
Methods
The human/chimpanzee alignments were created at UCSC with
blastz and
blat,
using a reciprocal best strategy with chaining and
netting. The initial alignments were generated using blastz on repeatmasked
sequence with following matrix:
A C G T
A 100 -300 -150 -300
C -300 100 -300 -150
G -150 -300 100 -300
T -300 -150 -300 100
O = 400, E = 30, K = 4500, L = 4500, M = 50
The overall score is the sum of the score over all pairs.
The resulting alignments were processed by the axtChain program. To place
additional chimp scaffolds that weren't initially aligned by blastz, a DNA blat
of the unmasked sequence was performed. The resulting blat alignments were also
chained, and then merged with the blastz-based chains produced in the previous
step to produce "all chains", which were further processed by the
chainNet and netSyntenic programs. Finally, a "reciprocal best"
strategy was employed to minimize paralog fill-in for missing orthologous chimp
sequence. Details of the alignment methods can be found in the descriptions of the Chimp Chain and Chimp Net tracks.
Chimp deletions in human were determined from the collection of indels implied by these alignments. The criteria for inclusion in the list of deletions were (i) within, not between, scaffolds; (ii) simple gaps only (no opposing, unmatched bases or double gaps); (iii) 80-12000 bp long; and (iv) not a missed overlap or incorrect gap size in assembly. These criteria aim to include plausible repeat insertions and exclude assembly and alignment artifacts.
Credits
The chimpanzee sequence used in this track was obtained from the 13 Nov. 2003
Arachne assembly. This sequence was provided by the National Human Genome
Research Institute (NHGRI), the Eli & Edythe L. Broad Institute at MIT/Harvard,
and Washington University School of Medicine.
The BLASTZ program was created by Webb Miller of the
Penn State Bioinformatics
Group.
Jim Kent at UCSC wrote the blat program, the chaining and netting programs, and
the scripts for displaying the alignments in this browser.
The list of mid-sized (80-12000 bp) chimp deletions relative to human was
provided by Tarjei Mikkelsen at MIT. The UCSC alignments of complete
chimpanzee scaffolds to the human genome assembly were used to generate this list.
References
ARACHNE: A Whole-Genome Shotgun Assembler.
Serafim Batzoglou, David B. Jaffe, Ken Stanley, Jonathan Butler, Sante Gnerre,
Evan Mauceli, Bonnie Berger, Jill P. Mesirov, and Eric S. Lander.
Genome Research 2002 Jan;12:177-189.
Whole-Genome Sequence Assembly for Mammalian Genomes: ARACHNE 2.
David B. Jaffe, Jonathan Butler, Sante Gnerre, Evan Mauceli, Kerstin Lindblad-Toh,
Jill P. Mesirov, Michael C. Zody, and Eric S. Lander.
Genome Research 2003 Jan;13(1):91-96.
Human-Mouse Alignments with BLASTZ. Schwartz S, Kent WJ,
Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W.
Genome Research 2003 Jan;13(1):103-7.
Scoring pairwise genomic sequence alignments.
Chiaromonte F, Yap VB, Miller W. Pac Symp Biocomput 2002;:115-26.
|