Schema for HapMap SNPs - HapMap SNPs (rel27, merged Phase II + Phase III genotypes)
  Database: hg19    Primary Table: hapmapSnpsMEX    Row Count: 1,409,912   Data last updated: 2011-01-23
Format description: HapMap genotype summary
On download server: MariaDB table dump directory
fieldexampleSQL type description
bin 589int(10) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Chromosome
chromStart 564476int(10) unsigned Start position in chrom (0 based)
chromEnd 564477int(10) unsigned End position in chrom (1 based)
name rs6650104varchar(255) Reference SNP identifier from dbSnp
score 6int(10) unsigned Minor allele frequency normalized (0-500)
strand +enum('+', '-', '?') Which genomic strand contains the observed alleles
observed A/Gvarchar(255) Observed string from genotype file
allele1 Aenum('A', 'C', 'G', 'T') This allele has been observed
homoCount1 76int(10) unsigned Count of individuals who are homozygous for allele1
allele2 Genum('C', 'G', 'T', 'none') This allele may not have been observed
homoCount2 0int(10) unsigned Count of individuals who are homozygous for allele2
heteroCount 1int(10) unsigned Count of individuals who are heterozygous

Sample Rows
 
binchromchromStartchromEndnamescorestrandobservedallele1homoCount1allele2homoCount2heteroCount
589chr1564476564477rs66501046+A/GA76G01
590chr1721289721290rs1256528613+C/GC0G742
590chr1728950728951rs112407676+C/TC76T01
590chr1752565752566rs3094315182+A/GA52G322
590chr1752720752721rs3131972240+A/GA4G4429
590chr1753404753405rs3115860169+A/CA54C320
590chr1754181754182rs3131969227+A/GA4G4627
590chr1754333754334rs3131967221+C/TC47T426
590chr1760911760912rs1048488182+C/TC3T5222
590chr1761146761147rs3115850171+C/TC53T320

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

HapMap SNPs (hapmapSnps) Track Description
 

Description

The HapMap Project identified a set of approximately four million common SNPs, and genotyped these SNPs in four populations in Phase II of the project. In Phase III, it genotyped approximately 1.4 to 1.5 million SNPs in eleven populations. This track shows the combined data from Phases II and III. The intent is that this data can be used as a reference for future studies of human disease. This track displays the genotype counts and allele frequencies of those SNPs, and (when available) shows orthologous alleles from the chimp and macaque reference genome assemblies.

The four million HapMap Phase II SNPs were genotyped on individuals from these four human populations:

Phase III expanded to eleven populations: the four above, plus the following: Each of the populations is displayed in a separate subtrack.

The HapMap assays provide biallelic results. Over 99.8% of HapMap SNPs are described as biallelic in dbSNP build 129; approximately 6,800 are described as more complex types (in-del, mixed, etc). 70% of the HapMap SNPs are transitions: 35% are A/G, 35% are C/T.

The orthologous alleles in chimp (panTro2) and macaque (rheMac2) were derived using liftOver.

No two HapMap SNPs occupy the same position. Aside from 430 SNPs from the pseudoautosomal region of chrX and chrY, no SNP is mapped to more than one location in the reference genome. No HapMap SNPs occur on "random" chromosomes (concatenations of unordered and unoriented contigs).

Display Conventions and Configuration

Note: calculation of heterozygosity has changed since the Phase II (rel22) version of this track. Observed heterozygosity is calculated as follows: each population's heterozygosity is computed as the proportion of heterozygous individuals in the population. The population heterozygosities are averaged to determine the overall observed heterozygosity. [For Phase II genotypes, expected heterozygosity was calculated as follows: the allele counts from all populations were summed (not normalized for population size) and used to determine overall major and minor allele frequencies. Assuming Hardy-Weinberg equilibrium, overall expected heterozygosity was calculated as two times the product of major and minor allele frequencies (see Modern Genetic Analysis, section 17-2).]

The human SNPs are displayed in gray using a color gradient based on minor allele frequency. The higher the minor allele frequency, the darker the display. By definition, the maximum minor allele frequency is 50%. When zoomed to base level, the major allele is displayed for each population.

The orthologous alleles from chimp and macaque are displayed in brown using a color gradient based on quality score. Quality scores range from 0 to 100 representing low to high quality. For orthologous alleles, the higher the quality, the darker the display. Quality scores are not available for chimp chromosomes chr21 and chrY; these were set to 98, consistent with the panTro2 browser quality track.

Filters are provided for the data attributes described above. Additionally, a filter is provided for observed heterozgosity (average of all populations' observed heterozygosities). Filters are applied to all subtracks, even if a subtrack is not displayed.

Notes on orthologous allele filters:

  • If a SNP's major allele is different between populations, no overall major allele for human is determined, thus the "matches major human allele" and "matches minor human allele" filters for orthologous alleles do not apply.
  • If a SNP is monomorphic in all populations, the minor allele is not verified in the HapMap dataset. In these cases, the filter to match orthologous alleles to the minor human allele will yield no results.

Credits

This track is based on International HapMap Project release 27 data, provided by the HapMap Data Coordination Center.

References

HapMap Project

The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61.

The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320.

The International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec 18;426(6968):789-96.

HapMap Data Coordination Center

Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005 Nov;15(11):1592-3.

A Sampling of HapMap Literature

Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006 Mar 1; 15(5):789-95.

Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W et al. Global variation in copy number in the human genome. Nature. 2006 Nov 23;444(7118):444-454.

Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. Common genetic variants account for differences in gene expression among ethnic groups. Nature Genet. 2007 Feb;39(2):226-31.

Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007 Apr;17(4):520-6.

Voight BF, Kudaravalli S, Wen X, Pritchard JK. A Map of Recent Positive Selection in the Human Genome. PLoS Biol. 2006 Mar;4(3):e72.

Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005 Nov;15(11):1468-76.

Data Source

The genotypes_chr*_*_r27_nr.b36_fwd.txt.gz files from the HapMap FTP site were processed to make this track.