Description
This annotation shows regions detected as putative copy number polymorphisms
(CNP) and sites of detected intermediate-sized structural variation (ISV).
The CNPs and ISVs were determined by various methods, displayed in
individual subtracks within the annotation:
-
BAC microarray analysis (Sharp): 154 putative CNP regions detected by BAC
microarray analysis in a population of 47 individuals comprised of 8
Chinese, 4 Japanese, 10 Czech, 2 Druze, 7 Biaka, 9 Mbuti, and 7 Amerindians.
-
BAC microarray analysis (Iafrate): 249 putative CNP regions detected by
BAC microarray analysis in a population of 55 individuals, 16 of which had
previously-characterized chromosomal abnormalities. The group consisted of 10
Caucasians, 4 Amerindians, 2 Chinese, 2 Indo-Pakistani, 2 Sub-Saharan
African, and 35 of unknown ethnic origin.
-
Representational oligonucleotide microarray analysis (ROMA) (Sebat): 72 putative
CNP regions detected by ROMA in a population of 20 normal individuals comprised
of 1 Biaka, 1 Mbuti, 1 Druze, 1 Melanesian, 4 French, 1 Venezualan, 1 Cambodian,
1 Mayan and 9 of unknown ethnicity.
-
Fosmid mapping (Tuzun): 297 ISV sites detected by mapping paired-end sequences
from a human fosmid DNA library.
-
Deletions from genotype analysis (McCarroll): 538 deletions detected
by analysis of SNP genotypes, using the HapMap Phase I data, release 16a.
-
Deletions from genotype analysis (Conrad): 910 deletions detected
by analysis of SNP genotypes, using the HapMap Phase I data, release 16c.1,
CEU and YRI samples.
-
Deletions from haploid hybridization analysis (Hinds): 100 deletions
from haploid hybridization analysis in 24 unrelated individuals from the
Polymorphism Discovery Resource, selected for SNP LD study.
-
SNP and BAC microarray analysis of HapMap data (Redon): 1,447 copy number
variable regions found in the HapMap Phase II data.
Display Conventions and Configuration
CNP and ISV regions are indicated by solid blocks that are color-coded to
indicated the type of variation detected:
-
Green: gain (duplications)
-
Red: loss (deletions)
-
Blue: gain and loss (both deletion and duplication)
-
Black: inversion
-
Gray: gain or loss (unknown direction)
Note that display IDs are not preserved between assemblies.
Sharp subtrack
On the details pages for elements in this subtrack,
the table shows value/threshold data for each individual in the population.
"Value" is defined as the log2 ratio of fluorescence intensity of
test versus reference DNA. "Threshold" is defined as 2 standard
deviations from the mean log2 ratio of all autosomal clones per
hybridization.
The "Disease Percent" value reflects the percent of the BAC that lies
within a "rearrangement hotspot", as defined in Sharp et al.
(2005). A
rearrangement hotspot is defined by the presence of flanking intrachromosomal
duplications >10 kb in length with >95% similarity and separated by
50 kb - 10 Mb of intervening sequence.
Tuzun subtrack
Items are labeled using the following naming convention:
- First letter: rearrangement type (D=deletion, I=insertion,
V=inversion).
- Second letter: association with repeat or duplication
(R=human-specific repeat, D=duplication, N=neither
(unique)).
- Third letter: second haplotype support (N=variant site lacking
support from the human genome reference, S=variant site with support
from the human genome reference).
Conrad subtrack
The method used to identify these deletions approximates the breakpoints of each
event; therefore, a set of minimal and maximal endpoints is associated with each
deletion. Thick lines delineate
the minimally deleted region; thin lines delineate the maximally deleted region.
Methods
Sharp BAC microarray analysis
All hybridizations were performed in duplicate incorporating a dye-reversal
using a custom array consisting of 2,194 end-sequence or FISH-confirmed BACs,
targeted to regions of the genome flanked by segmental duplications.
The false positive rate was estimated at ~3 clones per 4,000 tested.
Note that CNP intervals, as detailed by Sharp et al., were
converted from the July 2003 human genome assembly (NCBI Build 34) to the
May 2004 assembly (NCBI Build 35) using BLAT alignments of BAC End
pairs and the UCSC
liftOver
tool.
Iafrate BAC microarray analysis
All hybridizations were performed in duplicate incorporating a dye-reversal
using proprietary 1 Mb GenomeChip V1.2 Human BAC Arrays consisting of 2,632 BAC
clones (Spectral Genomics, Houston, TX). The false positive rate was estimated
at ~1 clone per 5,264 tested.
Further information is available from the
Database of Genomic
Variants website.
Note that CNP intervals, as detailed by Iafrate et al., were
converted from the July 2003 human genome assembly (NCBI Build 34) to the
May 2004 assembly (NCBI Build 35) using the UCSC
liftOver
tool.
Sebat ROMA
Following digestion with BglII or HindIII, genomic DNA was hybridized to a
custom array consisting of 85,000 oligonucleotide probes. The probes were
selected to be free of common repeats and have unique homology within the human
genome. The average resolution of the array was ~35 kb; however, only intervals
in which three consecutive probes showed concordant signals were scored as
CNPs. All hybridizations were performed in duplicate incorporating a
dye-reversal, with the false positive rate estimated to be ~6%.
Note that CNP intervals, as detailed by Sebat et al., were
converted from the April 2003 human genome assembly (NCBI Build 33) to the
July 2003 assembly (NCBI Build 34) and the May 2004 assembly
(NCBI Build 35) using the UCSC
liftOver
tool.
Tuzun fosmid mapping
Paired-end sequences from a human fosmid DNA library were mapped to the assembly.
The average resolution of this
technique was ~8 kb, and included 56 sites of inversion not detectable by
the array-based approaches. However, because of the physical constraints of
fosmid insert size, this technique was unable to detect insertions greater than
40 kb in size.
McCarroll genotype analysis
A segregating deletion can leave "footprints" in SNP genotype data, including
apparent deviations from Mendelian inheritance, apparent deviations from
Hardy-Weinberg equilibrium and null genotypes. Using these clues to discover
true variants is challenging, however, because the vast majority of such observations
represent technical artifacts and genotyping errors.
To determine whether a subset of "failed" SNP genotyping assays in the HapMap data
might reflect structural variation, the authors examined whether such failures
were physically clustered in a manner that is specific to individuals. Consistent
with this hypothesis, the rate of Mendelian-inconsistent genotypes was elevated
near other Mendelian-inconsistent genotypes in the same individual but was unrelated to
Mendelian inconsistencies in other individuals.
The authors systematically looked for regions of the genome in which the
same failure profile appeared repeatedly at nearby markers in a manner that
was statistically unexpected based on chance. A set of statistical thresholds was
tailored to each mode of failure, genotyping center and genotyping platform used in the
project. The same procedure could readily apply to dense SNP data from any
platform or study.
Note that deletions as detailed by McCarroll et al. were
converted from the July 2003 human genome assembly (NCBI Build 34) to the
May 2004 assembly (NCBI Build 35) using the UCSC
liftOver
tool.
Conrad genotype analysis
SNPs in regions that are hemizygous for a deletion are generally miscalled as homozygous
for the allele that is present. Hence, when a deletion is transmitted from parent to child,
the genotypes at SNPs within the deletion region will often appear to violate the rules of Mendelian
transmission. The authors developed a simple algorithm for scanning trio data for unusual runs of
consecutive SNPs that, in a single family, have genotype configurations consistent with the presence of a deletion.
Note that deletions as detailed by Conrad et al. were
converted from the July 2003 human genome assembly (NCBI Build 34) to the
May 2004 assembly (NCBI Build 35) using the UCSC
liftOver
tool.
Hinds haploid hybridization analysis
Approximately 600 Mb of genomic DNA from 24 unrelated individuals
were obtained from the Polymorphism Discovery Resource.
Haploid hybridization was used to identify genomic intervals
showing a reduced hybridization signal in comparison to the reference
assembly. PCR amplification was performed on 215 candidate deletions.
100 deletions were selected that were unambiguously confirmed.
Redon analysis of HapMap data
Experiments were performed with the International HapMap DNA and cell-line collection
using two technologies: comparative analysis of hybridization intensities on
Affymetric GeneChip Human Mapping 500K early access arrays (500K EA)
and comparative genomic hybridization with a Whole Genome TilePath (WGTP)
array.
Validation
McCarroll genotype analysis
Four methods of validation were used:
fluorescent in situ hybridization (FISH),
two-color fluorescence intensity measurements, PCR amplification and quantitative PCR.
The authors performed fluorescent in situ hybridization for five
candidate deletions large enough to span available FISH probes. In all five cases,
FISH assays confirmed the deletions in the predicted individuals.
The authors examined two-color allele-specific fluorescence data from SNP genotyping
assays from a data subset available at the Broad Institute, looking for a
reduction in fluorescence intensity in individuals predicted to carry a
deletion. At most SNPs
in the genome, fluorescence intensity measurements clustered into two or three
discrete groups corresponding to homozygous and hetrozygous genotypes.
At 15 of 17 candidate deletion loci, fluorescence intensity data for one or more
SNPs clustered into additional groups that corresponded to the predicted deletion
genotypes.
The authors used PCR amplification to query 60 loci for which the pattern of genotypes
suggested multiple individuals with homozygous deletions. Variants were considered
confirmed if the pattern of amplification success and failure matched prediction
across a set of 12-24 individuals. The authors confirmed 51 of 60 candidate
variants by this criterion.
The authors performed quantitative PCR in all 269 HapMap DNA samples for 11 candidate
deletions that overlapped the coding exons of genes and that were discovered in
many individuals. At 10/11 loci, the authors observed three discrete clusters, identifying
individuals with zero, one and two gene copies.
All 60 trios displayed Mendelian inheritance for the ten deletions, as well as
Hardy-Weinberg equilibrium in all four populations surveyed, and transmission rates
close to 50%. This suggests that the deletions behave as a stable, heritable
genetic polymorphism.
Conrad genotype analysis
The authors first tested 12 predicted deletions using quantitative PCR.
For all 12 deletions, DNA concentrations consistent with transmission of a
deletion from parent to child were observed.
To provide more extensive validation by comparative genome hybridization (CGH), the authors designed a
custom oligonucleotide microarray comprised of 380,000 probes that tile across all 134 candidate deletions
identified in 9 HapMap offspring (8 YRI and 1 CEU).
The results of this CGH analysis indicate that the majority (about 85%) of candidate deletions detected
by the method are real.
Redon analysis of HapMap data
The authors utilized numerous quality meaures, including
repeated experiments on the WGTP array for 82 individual and on the 500K EA
array for 15 individuals.
The average false-positive rate per experiment was held beneath 5%. Aberrant chromosomes were
removed from the analysis. Further details are available in the Nature paper cited below.
References
Conrad, D., Andrews, T.D., Carter, N.P., Hurles, M.E., Pritchard, J.K.
A high-resolution survey of deletion polymorphism in the human genome.
Nature Genet 38(1), 75-81 (2006).
Hinds, D., Kloek, A.P., Jen, M., Chen, X., Frazer, K.A.
Common deletions and SNPs are in linkage disequilibrium in the human genome.
Nature Genet 38(1), 82-85 (2006).
Iafrate, J.A., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y.,
Scherer, S.W. and Lee, C.
Detection of large-scale variation in the human genome.
Nature Genet 36(9), 949-51 (2004).
McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C.,
Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S., Lee, C., Daly, M.J.,
Altshuler, D.M.
Common deletion polymorphisms in the human genome.
Nature Genet 38(1), 86-92 (2006).
Redon, R., Ishikawa, S., Fitch, K., Feuk, L., Perry, G., Andrews, T., Fiegler, H.,
Lee, C., Jones, K., Scherer, S., Hurles, M. et al.
Global variation in copy number in the human genome.
Nature 444(7118), 444-454 (2006).
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P.,
Maner, S., Massa, H., Walker, M., Chi, M. et al.
Large-scale copy number polymorphism in the human genome.
Science 305(5683), 525-8 (2004).
Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Samonte, R.V.,
Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R. et al.
Segmental duplications and copy number variation in the human
genome.
Am J Hum Genet 77(1), 78-88 (2005).
Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M.,
Haugen, E., Hayden, H., Albertson, D. Pinkel, D. et al.
Fine-scale structural variation of the human genome.
Nature Genet 37(7), 727-32 (2005).
|
|