Schema for HapMap SNPs - HapMap SNPs (rel27, merged Phase II + Phase III genotypes)
  Database: hg18    Primary Table: hapmapSnpsCHB    Row Count: 3,871,247   Data last updated: 2007-05-22
Format description: HapMap genotype summary
On download server: MariaDB table dump directory
fieldexampleSQL type description
bin 585int(10) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Chromosome
chromStart 45161int(10) unsigned Start position in chrom (0 based)
chromEnd 45162int(10) unsigned End position in chrom (1 based)
name rs10399749varchar(255) Reference SNP identifier from dbSnp
score 0int(10) unsigned Minor allele frequency normalized (0-500)
strand +enum('+', '-', '?') Which genomic strand contains the observed alleles
observed C/Tvarchar(255) Observed string from genotype file
allele1 Cenum('A', 'C', 'G', 'T') This allele has been observed
homoCount1 44int(10) unsigned Count of individuals who are homozygous for allele1
allele2  enum('C', 'G', 'T', 'none') This allele may not have been observed
homoCount2 0int(10) unsigned Count of individuals who are homozygous for allele2
heteroCount 0int(10) unsigned Count of individuals who are heterozygous

Connected Tables and Joining Fields
        hg18.hapmapAllelesChimp.name (via hapmapSnpsCHB.name)
      hg18.hapmapAllelesMacaque.name (via hapmapSnpsCHB.name)
      hg18.hapmapLdPhCeu.name (via hapmapSnpsCHB.name)
      hg18.hapmapLdPhChbJpt.name (via hapmapSnpsCHB.name)
      hg18.hapmapLdPhYri.name (via hapmapSnpsCHB.name)
      hg18.hapmapSnpsCEU.name (via hapmapSnpsCHB.name)
      hg18.hapmapSnpsJPT.name (via hapmapSnpsCHB.name)
      hg18.hapmapSnpsYRI.name (via hapmapSnpsCHB.name)

Sample Rows
 
binchromchromStartchromEndnamescorestrandobservedallele1homoCount1allele2homoCount2heteroCount
585chr14516145162rs103997490+C/TC4400
585chr14525645257rs29494200+A/TT4500
585chr17243372434rs403030312-C/TC44T01
585chr17251472515rs40303000-G/TG4300
585chr17768877689rs38559520-C/TT3800
585chr17803178032rs94055012+C/TC0T441
585chr18146781468rs133287140+C/TC4500
586chr1222076222077rs114909370+A/GG4500
589chr1524445524446rs66834660+C/GC4500
589chr1530302530303rs15389410+C/TT4500

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

HapMap SNPs (hapmapSnps) Track Description
 

Description

The HapMap Project identified a set of approximately four million common SNPs, and genotyped these SNPs in four populations in Phase II of the project. In Phase III, it genotyped approximately 1.4 to 1.5 million SNPs in eleven populations. This track shows the combined data from Phases II and III. The intent is that this data can be used as a reference for future studies of human disease. This track displays the genotype counts and allele frequencies of those SNPs, and (when available) shows orthologous alleles from the chimp and macaque reference genome assemblies.

The four million HapMap Phase II SNPs were genotyped on individuals from these four human populations:

Phase III expanded to eleven populations: the four above, plus the following: Each of the populations is displayed in a separate subtrack.

The HapMap assays provide biallelic results. Over 99.8% of HapMap SNPs are described as biallelic in dbSNP build 129; approximately 6,800 are described as more complex types (in-del, mixed, etc). 70% of the HapMap SNPs are transitions: 35% are A/G, 35% are C/T.

The orthologous alleles in chimp (panTro2) and macaque (rheMac2) were derived using liftOver.

No two HapMap SNPs occupy the same position. Aside from 430 SNPs from the pseudoautosomal region of chrX and chrY, no SNP is mapped to more than one location in the reference genome. No HapMap SNPs occur on "random" chromosomes (concatenations of unordered and unoriented contigs).

Display Conventions and Configuration

Note: calculation of heterozygosity has changed since the Phase II (rel22) version of this track. Observed heterozygosity is calculated as follows: each population's heterozygosity is computed as the proportion of heterozygous individuals in the population. The population heterozygosities are averaged to determine the overall observed heterozygosity. [For Phase II genotypes, expected heterozygosity was calculated as follows: the allele counts from all populations were summed (not normalized for population size) and used to determine overall major and minor allele frequencies. Assuming Hardy-Weinberg equilibrium, overall expected heterozygosity was calculated as two times the product of major and minor allele frequencies (see Modern Genetic Analysis, section 17-2).]

The human SNPs are displayed in gray using a color gradient based on minor allele frequency. The higher the minor allele frequency, the darker the display. By definition, the maximum minor allele frequency is 50%. When zoomed to base level, the major allele is displayed for each population.

The orthologous alleles from chimp and macaque are displayed in brown using a color gradient based on quality score. Quality scores range from 0 to 100 representing low to high quality. For orthologous alleles, the higher the quality, the darker the display. Quality scores are not available for chimp chromosomes chr21 and chrY; these were set to 98, consistent with the panTro2 browser quality track.

Filters are provided for the data attributes described above. Additionally, a filter is provided for observed heterozgosity (average of all populations' observed heterozygosities). Filters are applied to all subtracks, even if a subtrack is not displayed.

Notes on orthologous allele filters:

  • If a SNP's major allele is different between populations, no overall major allele for human is determined, thus the "matches major human allele" and "matches minor human allele" filters for orthologous alleles do not apply.
  • If a SNP is monomorphic in all populations, the minor allele is not verified in the HapMap dataset. In these cases, the filter to match orthologous alleles to the minor human allele will yield no results.

Credits

This track is based on International HapMap Project release 27 data, provided by the HapMap Data Coordination Center.

References

HapMap Project

The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61.

The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320.

The International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec 18;426(6968):789-96.

HapMap Data Coordination Center

Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005 Nov;15(11):1592-3.

A Sampling of HapMap Literature

Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006 Mar 1; 15(5):789-95.

Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W et al. Global variation in copy number in the human genome. Nature. 2006 Nov 23;444(7118):444-454.

Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. Common genetic variants account for differences in gene expression among ethnic groups. Nature Genet. 2007 Feb;39(2):226-31.

Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007 Apr;17(4):520-6.

Voight BF, Kudaravalli S, Wen X, Pritchard JK. A Map of Recent Positive Selection in the Human Genome. PLoS Biol. 2006 Mar;4(3):e72.

Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005 Nov;15(11):1468-76.

Data Source

The genotypes_chr*_*_r27_nr.b36_fwd.txt.gz files from the HapMap FTP site were processed to make this track.