B.1.1.7 in USA Track Settings
 
Early Lineage B.1.1.7 (a.k.a. 501Y.V1, VOC 202012/01) Sequences in the United States: SNVs and Deletions   (All Variation and Repeats tracks)

Display mode:   

Sample sorting display

Enable Sample sorting display
Sample sorting order:
using the tree specified in file associated with track
using middle variant in viewing window as anchor.
Samples are clustered by similarity around a central variant. Samples are reordered for display using the clustering tree, which is drawn in the left label area.
To anchor the sorting to a particular variant, click on the variant in the genome browser, and then click on the 'Use this variant' button on the next page.
using the order in which samples appear in the underlying VCF file
Allele coloring scheme:
reference alleles invisible, alternate alleles in black
reference alleles invisible, alternate alleles in red for non-synonymous, green for synonymous, blue for UTR/noncoding, black otherwise
reference alleles in blue, alternate alleles in red
first base of allele (A = red, C = blue, G = green, T = magenta)
Sample sorting display height:
Minimum minor allele frequency (if INFO column includes AF or AC+AN):

VCF configuration help

View table schema

Description

Lineage B.1.1.7 (Rambaut et al.), also known as 20B/501Y.V1 (Nextstrain) and Variant of Concern (VOC) 202012/01 (Public Health England), spread rapidly in England in November and December 2020 (Volz et al.). It has a large number of mutations including non-synonymous substitutions and deletions, and as of Jan. 4 2021, over 7,000 sequences from 29 countries have been submitted to GISAID. The first confirmed B.1.1.7 sequence in the United States was announced Dec. 29 2020.

This track shows single-nucleotide substitutions and deletions from the SARS-CoV-2 reference genome in the B.1.1.7 consensus sequence and the first nine genome sequences in the United States that were assigned to B.1.1.7.

The track was generated using hgPhyloPlace, the Genome Browser's web front end to UShER (Turakhia et al., see Methods below). UShER places uploaded sequences in a global phylogenetic tree and also extracts subtrees showing each sample's local phylogenetic context. hgPhyloPlace generates a JSON file for each subtree which can be displayed using nextstrain.org. The first nine U.S. B.1.1.7 sequences have been placed in five clusters which correlate with geographic location. Here are links to view the subtrees at nextstrain.org:

Display Conventions

In "dense" mode, a vertical line is drawn at each position where there is a mutation. In "squish" and "pack" modes, the display shows a plot of all samples' mutations, with samples ordered using the phylogenetic tree in order to highlight patterns of linkage. "Full" display mode shows each mutation on its own row, ordered by position instead of lineage.

Each sample is placed in a horizontal row of pixels; when the number of samples exceeds the number of vertical pixels for the track, multiple samples fall in the same pixel row and pixels are averaged across samples.

Each mutation is a vertical bar at its position in the SARS-CoV-2 genome with white (invisible) representing the reference allele; the non-reference allele is shown in red if it changes the protein sequence of a gene, green if it falls within a gene but does not change the protein, and black if it does not fall within a gene. Tick marks are drawn at the top and bottom of each mutation's vertical bar to make the bar more visible when most alleles are reference alleles.

The phylogenetic tree showing inferred relationships between the samples is depicted in the left column of the display. Mousing over this will show the sample identifiers. With the default font size (or smaller), the leaves of the tree are labeled by sample identifiers. For larger font sizes, the track height will need to be increased in order for the labels to fit. The track height can be adjusted in the track controls, which can be reached by clicking on the gray button to the left of the tree or by right-clicking on the image.

Methods

B.1.1.7 consensus sequence was determined from COG-UK sequences assigned to B.1.1.7 with early sample collection dates. The nine U.S. B.1.1.7 genome sequences available as of Jan. 2, 2021 were downloaded from GenBank and GISAID and uploaded to hgPhyloPlace, which uses UShER (Turakhia et al.) to place uploaded SARS-CoV-2 genome sequences in a global phylogenetic tree, and generates custom tracks for the Genome Browser showing single-nucleotide substitutions in uploaded sequences. hgPhyloPlace ignores insertion/deletion mutations, working only with substitutions because those are adequate for inferring phylogeny; however, since B.1.1.7 has four deletions, three of which cause amino acid deletions from genes, minimap2 (Li) was used to align B.1.1.7 to the reference genome so that deletions could be displayed in addition to substitutions.

Data Access

The first sequences from California, Colorado, Florida and New York are available from GenBank:

All nine sequences are available from GISAID. GISAID data displayed in the Genome Browser are subject to GISAID's Terms and Conditions. SARS-CoV-2 genome sequences and metadata are available for download from GISAID EpiCoV™.

COG-UK releases daily updates of sequences and metadata; scroll down to the "Latest Sequence Data" section of the Data page for links.

The mutations in the B.1.1.7 consensus sequence and the sequences available from GenBank may be downloaded in Variant Call Format (VCF): lineageB_1_1_7_US_first7.vcf.gz

The mutation-annotated phylogenetic tree file used by UShER to place the sequences may be downloaded in order to run UShER locally: public-2020-12-08.all.plus.cogUk.12-30.masked.pb.

Credits

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge the authors and the originating laboratories where the clinical specimen or virus isolate was first obtained and the submitting laboratories, where sequence data have been generated and submitted to public databases, on which this research is based.

References

Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020 Nov;5(11):1403-1407. PMID: 32669681

Rambaut A, Loman N, Pybus O, Barclay W, Barrett J, Carabelli A, Connor T, Peacock T, Robertson DL, Volz E, et al. Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. Virological. 2020 Dec 18.

Volz E, Mishra S, Chand M, Barrett JC, Johnson E, Geidelberg L, Hinsley WR, Laydon DJ, Dabrera G, O'Toole Á, et al. Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. Virological. 2020 Dec 31.

Turakhia Y, Thornlow B, Hinrichs AS, De Maio M, Gozashti L, Lanfear R, Haussler D, Corbett-Detig R. Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic. bioRxiv. 2020 Sep 28.

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. PMID: 29750242; PMC: PMC6137996