Nextstrain Mutations Track Settings
 
Nextstrain Subset of GISAID EpiCoV TM Sample Mutations   (All Variation and Repeats tracks)

Maximum display mode:       Reset to defaults

Sample sorting display

Enable Sample sorting display
Sample sorting order:
using the tree specified in file associated with track
using middle variant in viewing window as anchor.
Samples are clustered by similarity around a central variant. Samples are reordered for display using the clustering tree, which is drawn in the left label area.
To anchor the sorting to a particular variant, click on the variant in the genome browser, and then click on the 'Use this variant' button on the next page.
using the order in which samples appear in the underlying VCF file
Allele coloring scheme:
reference alleles invisible, alternate alleles in black
reference alleles invisible, alternate alleles in red for non-synonymous, green for synonymous, blue for UTR/noncoding, black otherwise
reference alleles in blue, alternate alleles in red
first base of allele (A = red, C = blue, G = green, T = magenta)
Sample sorting display height:
Minimum minor allele frequency (if INFO column includes AF or AC+AN):


Display data as a density graph:

VCF configuration help

Select views (Help):
All Samples ▾       Year-Letter Clades ▾      
List subtracks: only selected/visible    all    ()  
hide
 Configure
 Rec Bi-allelic  Recurrent Bi-allelic Mutations in Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 All  Mutations in Nextstrain Subset of GISAID EpiCov TM Samples    
hide
 Configure
 19A Mutations  Mutations in Clade 19A Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 19B Mutations  Mutations in Clade 19B Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20A Mutations  Mutations in Clade 20A Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20B Mutations  Mutations in Clade 20B Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20C Mutations  Mutations in Clade 20C Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20D Mutations  Mutations in Clade 20D Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20E/EU1 Mutations  Mutations in Clade 20E/EU1 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20F Mutations  Mutations in Clade 20F Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20G Mutations  Mutations in Clade 20G Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20H/Beta Mutations  Mutations in Clade 20H/501Y.V2/Beta Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20I/Alpha Mutations  Mutations in Clade 20I/501Y.V1/Alpha Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 20J/Gamma Mutations  Mutations in Clade 20J/501Y.V3/Gamma Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21A/Delta Mutations  Mutations in Clade 21A/Delta Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21B/Kappa Mutations  Mutations in Clade 21B/Kappa Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21C/Epsilon Mutations  Mutations in Clade 21C/Epsilon Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21D/Eta Mutations  Mutations in Clade 21D/Eta Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21E/Theta Mutations  Mutations in Clade 21E/Theta Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21F/Iota Mutations  Mutations in Clade 21F/Iota Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21G/Lambda Mutations  Mutations in Clade 21G/Lambda Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21H/Mu Mutations  Mutations in Clade 21H/Mu Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21I/Delta Mutations  Mutations in Clade 21I/Delta Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21J/Delta Mutations  Mutations in Clade 21J/Delta Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21K/BA.1 Mutations  Mutations in Clade 21K/Omicron/BA.1 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21L/BA.2 Mutations  Mutations in Clade 21L/Omicron/BA.2 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 21M/B.1.1.529 Mutations  Mutations in Clade 21M/Omicron/B.1.1.529 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 22A/BA.4 Mutations  Mutations in Clade 22A/Omicron/BA.4 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 22B/BA.5 Mutations  Mutations in Clade 22B/Omicron/BA.5 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 22C/BA.2.12.1 Mutations  Mutations in Clade 22C/Omicron/BA.2.12.1 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 22D/BA.2.75 Mutations  Mutations in Clade 22D/Omicron/BA.2.75 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 22E/BQ.1 Mutations  Mutations in Clade 22E/Omicron/BQ.1 Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 22F/XBB Mutations  Mutations in Clade 22F/Omicron/XBB Nextstrain Subset of GISAID EpiCoV TM Samples    
hide
 Configure
 23A/XBB.1.5 Mutations  Mutations in Clade 23A/Omicron/XBB.1.5 Nextstrain Subset of GISAID EpiCoV TM Samples    
    
Assembly: SARS-CoV-2 Jan. 2020 (NC_045512.2)


updated Note: Now updated daily

Description

Nextstrain.org displays data about mutations in the SARS-CoV-2 RNA and protein sequences that have occurred in different samples of the virus during the current 2019-2021 outbreak. Nextstrain has a powerful user interface for viewing the evolutionary tree that it infers from the patterns of mutations in sequences worldwide, but does not offer a detailed plot of mutations along the genome that can be correlated with other molecular information, so we have processed their data into this track to display the mutations called by Nextstrain for each sample that Nextstrain has obtained from GISAID.

Click on the vertical column in the display for any position in the SARS-CoV-2 genome to see more details about the mutation(s) that occur at that position, including protein change (if applicable; protein changes use gene names in the Nextstrain Genes track), number of samples with the mutation, list giving the nucleotide (allele) for that position in each GISAID sample, etc.

Nextstrain identifies certain clades within the phylogenetic tree according to a set of defining mutations. The Nextstrain Clades track provides more information about these clades and serves as a useful color key for the clade colors in the phylogenetic tree display.

This track is composed of several subtracks so that different subsets of mutations may be viewed:

  • Recurrent Bi-allelic: This is the only subtrack displayed by default. It is limited to mutations that have been observed in at least two samples, and excludes positions at which more than one alternate allele has been observed in more than one sample.
  • All: All mutations found in all samples.
  • <Clade> Mutations: All mutations found in samples belonging to <Clade>, which is one of Nextstrain's clades (19A, 19B, 20A, etc.)

Display Conventions

In "dense" mode, a vertical line is drawn at each position where there is a mutation. In "pack" mode, the display shows a plot of all samples' mutations, with samples ordered using Nextstrain's phylogenetic tree in order to highlight patterns of linkage.

Each sample is placed in a horizontal row of pixels; when the number of samples exceeds the number of vertical pixels for the track, multiple samples fall in the same pixel row and pixels are averaged across samples.

Each mutation is a vertical bar at its position in the SARS-CoV-2 genome with white (invisible) representing the reference allele and black representing the non-reference allele(s). Tick marks are drawn at the top and bottom of each mutation's vertical bar to make the bar more visible when most alleles are reference alleles. Insertions and deletions are not shown as these are removed from the data by Nextstrain.

The phylogenetic tree for the samples built by Nextstrain is depicted in the left column of the display. Mousing over this will show the GISAID identifiers for the different samples. When the vertical height of the track is set sufficiently high (10 pixels per sample with the default font), sample names are drawn to the right of the tree; however, with thousands of samples in the Nextstrain tree, and a maximum track height of 2500 pixels, the full Nextstrain tree is too large for sample names to be displayed. In the track controls, the user can choose to display subtracks containing the phylogenetic trees and mutations for individual clades. Some clades have few enough samples that they can be made tall enough to display sample names. Branches of the phylogenetic tree are colored by clade using the same color scheme as nextstrain.org.

Methods

Nextstrain downloads SARS-CoV-2 genomes from GISAID as they are submitted by labs worldwide, and downsamples to a subset of several thousand sequences in order to provide an interactive display. The selected subset of GISAID sequences is processed by an automated pipeline, producing an annotated phylogenetic tree data structure underlying the Nextstrain display; UCSC downloads the results and extracts annotations for display.

Data Access

SARS-CoV-2 mutations displayed by Nextstrain are derived from a subset of GISAID sequences, and the GISAID Terms and Conditions prohibit the redistribution of GISAID-derived data. They also require that the submitters of all sequences be acknowledged when the mutations are used. Nextstrain.org offers phylogenetic trees, author credits and other files: scroll to the bottom of the page and click "DOWNLOAD DATA", and a dialog with download options appears.

All GISAID SARS-CoV-2 genome sequences and metadata are available for download from GISAID EpiCoV™ by registered users. We have a program faToVcf that can extract VCF from a multi-sequence FASTA alignment such as the "msa_date" download file from GISAID. faToVcf is available for Linux and MacOSX on the download server: https://hgdownload.soe.ucsc.edu/admin/exe. It requires at least 4GB of memory to process the complete msa_date file. Here are some steps to get started using faToVcf:

  • This command enables faToVcf to be run as a program (otherwise the command would say "Permission denied"):
    chmod a+x faToVcf
  • This command shows basic usage instructions and describes the options:
    ./faToVcf
  • This command converts msa fasta to VCF without per-sample genotype columns (substitute correct date for "0925" in filenames):
    ./faToVcf -includeRef \
        -ref='hCoV-19/Wuhan/Hu-1/2019|EPI_ISL_402125|2019-12-31|Asia' \
        -vcfChrom=NC_045512.2 \
        -noGenotypes \
        msa_0925.fasta msa_0925.sites.vcf
    

Credits

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to nextstrain.org for sharing its analysis of genomes collected by GISAID.

Data usage policy

The data presented here is intended to rapidly disseminate analysis of important pathogens. Unpublished data is included with permission of the data generators, and does not impact their right to publish. Please contact the respective authors if you intend to carry out further research using their data. Author contact info is available via nextstrain.org: scroll to the bottom of the page, click "DOWNLOAD DATA" and click "ALL METADATA (TSV)" in the resulting dialog.

References

Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018 Dec 1;34(23):4121-4123. PMID: 29790939; PMC: PMC6247931

Sagulenko P, Puller V, Neher RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018 Jan;4(1):vex042. PMID: 29340210; PMC: PMC5758920

Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015 Jan;32(1):268-74. PMID: 25371430; PMC: PMC4271533