Microdeletions Track Settings
Microdeletions in GISAID sequences   (All Variation and Repeats tracks)

Display mode:      Duplicate track
Data schema/format description and download
Data last updated at UCSC: 2020-05-20


This track shows deletions that have been found in the sequences uploaded to the GISAID database as of June 6, 2020. Three confidence levels of deletion calls are shown:

  • deletions found in at least 1 GISAID sequence
  • deletions found in at least 2 GISAID sequences
  • deletions found in at least 2 GISAID sequences that were able to be validated with raw reads.


We accessed all GISAID SARS-CoV-2 sequences on June 6, 2020. We filtered to high coverage reads encompassing the entire SARS-CoV-2 genome (>=29000 bps), leaving 12,403 sequences. We aligned the reads using MAFFT.


We validated several deletions with the raw reads from NCBI's SRA Run browser. Additionally, NYU Langone Health provided us with the aligned reads for many of their sequences.

Data Access

The raw data can be explored interactively with the Table Browser, combined with other datasets in the Data Integrator tool, or downloaded directly as "microdel.txt.gz" from the download server. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.


We thank all of the labs that submitted their sequences to the GISAID database. The full acknowledgement table can be found at https://github.com/briannachrisman/SARS-CoV-2_Microdeletions/blob/master/acknowledgments.pdf. We thank the public health laboratories VIDRL and MDU-PHL at The Peter Doherty Institute for Infection and Immunity for providing over 1000 high quality raw reads to NCBI. Thank you NYU Langone SARS-CoV2 Sequencing Team's Matthew T Maurano, Matija Snuderl, and Adriana Heguy for providing many of their raw reads.


Chrisman, Brianna Sierra, Kelley Paskov, Nate Stockham, Kevin Tabatabaei, Jae-Yoon Jung, Peter Washington, Maya Varma, Min Woo Sun, Sepideh Maleki, and Dennis P. Wall. "Indels in SARS-CoV-2 occur at template-switching hotspots." BioData Mining 14, no. 1 (2021): 1-16. https://doi.org/10.1186/s13040-021-00251-0