Online tool tracks mutations in COVID-19 virus

COVID CG can help scientists monitor SARS-CoV-2’s evolution around the world and inform vaccine and therapeutics development.

Zayna Sheikh, ӳý Communications
Credit: Zayna Sheikh, ӳý Communications

Update (August 26, 2021): In its first year, COVID CG has been accessed more than 47,900 times by users from 180 countries, informing public health efforts and the development of diagnostics and therapeutics globally. New features help users to find out how the different SARS-CoV-2 lineages and variants are related to each other: What are the mutations that distinguish my lineage or variant from the original SARS-CoV-2 genome? How frequently do specific mutations occur in that lineage or variant as compared to other lineages? The upgraded COVID CG allows users to rapidly analyze the 2.2 million and growing SARS-CoV-2 genomes deposited in the GISAID database.

As new variants of the SARS-CoV-2 virus continue to emerge and spread around the world, public health and research communities have emphasized the need for timely tracking of new viral mutations. Seeing where variants crop up and how prevalent they are will help researchers monitor how the virus is evolving and design experiments to test how well current vaccines, therapeutics and diagnostics are targeting the emerging variants.

A team from the ӳý of MIT and Harvard has built a powerful new tool to do just that. Called , the browser-based genomic tracker allows scientists to survey the global genetic landscape of the SARS-CoV-2 virus at any given point in time. It pulls together all of the sequenced SARS-CoV-2 genomes that have been uploaded to the database, which scientists have long used to share viral genetic information. Through COVID CG’s interactive graphics, users can detect emerging genetic mutations and viral variants, monitor which mutations and viral genomes are present in specific parts of the world and how their prevalence changes over time, and identify which variants scientists should test their vaccines and therapeutics against.

“We need data from all around the world to get a better understanding of how this pandemic is progressing, and how vaccines, therapeutics, and diagnostics need to be adapted over time to meet these emerging variants,” said Alina Chan, a postdoc in the Vector Engineering group at the ӳý and a co-senior author on a paper in  that describes COVID CG.

“We're providing a tool that allows users to, in a fairly intuitive and interactive way, answer questions they have around tracking SARS-CoV-2 mutations,” said Ben Deverman, director of the Vector Engineering group and co-senior author on the paper.

People from more than 100 countries have already used the site since its launch in August 2020. And a partnership between Deverman’s team and AstraZeneca, which has developed a COVID-19 vaccine with Oxford University, will allow the team to add more advanced features to the tool and improve its ability to handle large amounts of data.

GISAID growth

The GISAID database contains more than 400,000 sequenced SARS-CoV-2 genomes. Before COVID CG, scientists who wanted to monitor mutations in the virus could either download data from GISAID or turn to browser-based tools to fully explore the genetic data.

Deverman and his team designed COVID CG for any researcher, even someone without any bioinformatics expertise. Work began in May 2020, when GISAID had fewer than 35,000 SARS-CoV-2 genomes. The team has since improved COVID CG to be able to handle the 13 to 14 gigabytes data now in GISAID.

COVID CG features various search functions and numerous charts and tables. Users can, for example, search for a mutation such as N501Y, which is found in the variants that were first detected in the UK and South Africa and have been widely discussed in the news. Users can then filter the data by the location in the world where the mutation has appeared, find out when it was first sequenced in a particular country, what its co-occurring mutations are, and more. The charts and tables are interactive, allowing for a deep dive into the data.

“We have designed the site so that it's pretty easy to navigate,” said Albert Chen, first author of the study and a computational associate in Deverman’s group. “A lot of the utility of this site comes from just being able to visualize this volume of data in real time.”

Data delay

COVID CG pulls data from GISAID as soon as it is available in the database. However, labs around the world report a median delay of 20 to 80 days in depositing their data in GISAID.

Even when it is uploaded, the authors say that the data could have biases that could paint a slightly distorted picture of the SARS-CoV-2 landscape. For example, some labs focus on sequencing samples most likely to have mutations of interest. Some may prioritize samples from people who have traveled from regions where new variants have been detected and are becoming more prevalent, such as the UK.

The researchers say that the best way to capture the true state of SARS-CoV-2 mutations is government-funded efforts that support rapid sequencing of many samples in an unbiased way, and prompt and open data-sharing. The more up-to-date and representative the GISAID data are, the more valuable the answers gleaned from COVID CG will be, they add.

“This is a democratization of viral genomic data. COVID CG was made to be a resource for everyone to use freely and quickly,” said Chan. “It’s a small part of a big picture, with scientists all over the world trying their best to fight this pandemic.”

Support for this research was provided in part by the National Institute of Neurological Disorders and Stroke, the National Institute of Mental Health, the Stanley Center for Psychiatric Research at the ӳý, and the ӳý of MIT and Harvard.

Paper(s) cited

Chen, AT et al. . eLife. Online February 23, 2021. DOI: 10.7554/eLife.63409