1000G Archive 1000 Genomes Trios Track Settings
 
Thousand Genomes Project Family VCF Trios

Track collection: 1000 Genomes Archive

+  Description
+  All tracks in this collection (2)

Display mode:       Reset to defaults

Show child haplotypes below parents:
Label samples by: VCF file sample names   Family Labels
Hide parent sample(s)
Allele coloring scheme:
No color
predicted functional affect: reference alleles invisible, alternate alleles in red for non-synonymous, green for synonymous, blue for UTR/noncoding, black otherwise
predicted de novo child mutations red
child variants that are inconsistent with phasing red


Display data as a density graph:

VCF configuration help

List subtracks: only selected/visible    all    ()  
pack
 Configure
 1463 CEU Trio  1000 Genomes Utah CEPH Trio   Data format 
pack
 Configure
 m004 MXL Trio  1000 Genomes m004 Mexican Ancestry from Los Angeles Trio   Data format 
pack
 Configure
 m011 MXL Trio  1000 Genomes m011 Mexican Ancestry from Los Angeles Trio   Data format 
pack
 Configure
 PR05 PUR Trio  1000 Genomes Puerto Ricans from Puerto Rico Trio   Data format 
pack
 Configure
 SH089 CHS Trio  1000 Genomes Southern Han Chinese Trio   Data format 
pack
 Configure
 VN049 KHV Trio  1000 Genomes Kinh in Ho Chi Minh City, Vietnam Trio   Data format 
pack
 Configure
 Y117 YRI Trio  1000 Genomes Yoruban in Ibadan, Nigeria Trio   Data format 
    
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This track shows approximately 4.5 million single nucleotide variants (SNVs) and 0.6 million short insertions/deletions (indels) from 7 different parent/child trios as produced by the International Genome Sample Resource (IGSR), from sequence data generated by the 1000 Genomes Project in its Phase 3 sequencing of 2,504 genomes from 16 populations worldwide.

Variants were called on the autosomes (chromosomes 1 through 22) and on the Pseudo-Autosomal Regions (PARs) of chromosome X. Therefore this track has no annotations on alternate haplotype sequences, fix patches, chromosome Y, or the non-PAR portion (the majority) of chromosome X.

The variant genotypes have been phased (i.e., the two alleles of each diploid genotype have been assigned to two haplotypes, one inherited from each parent). This information allows us to illustrate which haplotypes in the child have been inherited from which parent.

Trios from six different populations are available, including:

  • YRI - Yoruban from Idaban, Nigeria
  • KHV - Kinh in Ho Chi Minh City, Vietnam
  • PUR - Puerto Ricans from Puerto Rico
  • CEU - CEPH Utah
  • CHS - Southern Han Chinese
  • MXL - Mexican Ancestry from Los Angeles

Display Conventions and Configuration

This track illustrates the vcfPhasedTrio track type, where two lines, one for each chromosome in the diploid genome, is drawn per sample in the underlying VCF. Variants in the window are then drawn on the haplotype line corresponding to which haplotype they belong to, such that variants on the same line were likely inherited together. The sorting routine is the same as what is used to draw the haplotype sorted display in the non-trio 1000 Genomes track, and is described here.

The child haplotypes are drawn in the center of each group, flanked above and below by parent haplotypes, and variants are sorted to show the transmitted alleles:

parent 1 untransmitted haploytpe 
parent 1 transmitted haplotype
child haplotype inherited from parent 1
child haplotype inherited from parent 2
parent 2 transmitted haplotype
parent 2 untransmitted haploytpe 

Track configuration options include:

  • Showing the child haplotypes below the parent(s)
  • Toggling the haplotype labels with mother/father/child or VCF sample IDs
  • Hiding the parent samples

Allele coloring options include:

  • No shading - the default option
  • Shading by functional effect of the variant relative to NCBI RefSeq Curated Transcripts:
    • reference alleles invisible
    • alternate alleles in red for non-synonymous
    • alternate alleles in green for synonymous
    • alternate alleles in blue for UTR/noncoding
    • alternate alleles in black otherwise
  • Child de novo alleles in red - all alternate alleles black except for cases where the child has an allele not present in either parent
  • Child alleles that are "inconsistent" with phasing in red - all alternate alleles black except for cases where the "inherited" child allele does not match the "transmitted" parent allele. Note that as the genomic location changes, and thus the alleles present to use for sorting change, whether an allele is marked as inconsistent can change as well. Because all the variants present in the window are considered a haplotype, what haplotypes are considered "inherited" and "transmitted" varies as the viewing location changes

From the subtrack configure menu, there is the option to manually rearrange the family order for each trio by dragging haplotypes.

Clicking on a variant takes one to a details page with the standard VCF details, including INFO column annotations, the REF and ALT alleles, and the genotypes from all three samples.

Methods

The genomes of 2,504 individuals were sequenced using both whole-genome sequencing (mean depth = 7.4x) and targeted exome sequencing (mean depth = 65.7x). Sequence reads were aligned to the reference genome using alt-aware BWA-MEM (Zheng-Bradley et al.). Variant discovery and quality control were performed as described in Lowy-Gallego et al.

See also:

UCSC Methods

Trio samples were extracted out of both the main 1000 Genomes set, and the related samples using the pedigree information from 1000 Genomes. Variants that were homozygous reference across all three samples were removed.

Data Access

Trio VCFs are available for download from our download server.

Credits

Thanks to the International Genome Sample Resource (IGSR) for making these variant calls freely available.

References

Zheng-Bradley X, Streeter I, Fairley S, Richardson D, Clarke L, Flicek P, 1000 Genomes Project Consortium. Alignment of 1000 Genomes Project reads to reference assembly GRCh38. Gigascience. 2017 Jul 1;6(7):1-8. PMID: 28531267; PMC: PMC5522380

Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res. 2019 Oct 4. PMID: 31584097

Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P, 1000 Genomes Project Consortium. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project [version 1; peer review: 2 not approved]. Wellcome Open Research. 2019 Mar. 11.

1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA et al. A global reference for human genetic variation. Nature. 2015 Oct 1;526(7571):68-74. PMID: 26432245