Introduction and spread of SARS-CoV-2 in the greater Boston area

New data from the ӳý of MIT and Harvard, Massachusetts General Hospital, and the Massachusetts Department of Public Health provide insights into the entry of SARS-CoV-2 virus in the Boston area and epidemiological events that shaped the trajectory of the epidemic in the region.

This scanning electron microscope image shows SARS-CoV-2 (blue) emerging from the surface of lab-cultured cells.
Credit: National Institute of Allergy and Infectious Diseases
This scanning electron microscope image shows SARS-CoV-2 (blue) emerging from the surface of lab-cultured cells.

The first confirmed case of COVID-19 in Massachusetts, a traveler returning from Wuhan, China, was reported in early February. In the weeks that followed, no new cases were detected. Instead, the local narrative about the virus’s early spread in the greater Boston area — among the worst hit regions in the country — focused on an international professional conference in Boston in late February.

New viral genomic data provides a more complex and detailed story of the pandemic’s arrival and spread across the city and the Commonwealth.

Our dedicated team of researchers, clinicians, and public health professionals — a collaboration representing Massachusetts General Hospital (MGH), the Massachusetts Department of Public Health, and the ӳý’s viral genomics group — has been sequencing the genomes of SARS-CoV-2 samples from confirmed COVID-19 cases for genomic epidemiological analysis.

Today, we are releasing the first 331 complete genomes from this ongoing effort. These genomes represent a dense sample from MGH (a large tertiary referral hospital), during the first month of the outbreak, and help tell the story of the introduction and early spread of SARS-CoV-2 in the Boston area and beyond. The data suggest that the virus from the first diagnosed case in the state (sequenced by the CDC) was indeed contained effectively, leading to no subsequent cases. But in the weeks that followed, the virus entered the region many more times — we estimate at least 30 unique introductions — in many cases leading to onward community transmission. 

It is clear from the data that no single event or importation alone is responsible for the ongoing spread of COVID-19 in the Boston area; rather, there were multiple entries within a few weeks, from both domestic and international sources.

Combining the genomic data with epidemiological information allowed us to trace the likely source of some of these introductions. SARS-CoV-2 entries into the Boston area have come from multiple sources, particularly through Europe. Within the US, the data suggest that sources included Washington state, where COVID-19 outbreaks had begun weeks before. The data show that the mix of SARS-CoV-2 lineages in the Boston area was similar to those in other parts of the northeastern United States, although it is not yet clear whether this resulted from direct connection between particular cities or states, or if it represents a common virus genetic fingerprint in the Northeast region.

One example of importation of SARS-CoV-2 into the Boston area was indeed the international conference in February. We reconstructed 28 viral genomes captured from individuals associated with the conference, and all were closely related, suggesting this was a “superspreading” event. These infections led to further community transmission in the Boston area and beyond. The ancestral relationships of these genomes suggest they came from a single source, likely in Europe.

We also investigated a second notable event, this time in a congregate living facility, that demonstrated how quickly and quietly COVID-19 can spread among a vulnerable population. In this case, all of the residents and most of the staff within the facility were tested, initially not because COVID-19 was suspected, but as a precaution prior to a planned relocation. More than half of the residents, along with dozens of staff, tested positive.

Genomic analysis of these infections showed several surprising things. Even though COVID-19 was not suspected, the virus had entered this community on three separate occasions. One of those introductions was responsible for more than 90% of the infections. The limited genetic diversity among these cases indicated a very rapid spread in the facility.

More broadly, this dataset reveals the genetic ancestry of SARS-CoV-2 now present in Massachusetts, which in turn provides the foundation for ongoing monitoring of the spread of COVID-19 within the state. As the world continues to try to control this pandemic, viral genetic sequencing can help us track new routes of importation, identify sources seeding ongoing infection, supplement conventional contact tracing, and dissect the details of local transmission within communities and institutions. With the data in hand, we have already been able to identify critical information about specific transmissions — information that can only come from looking at the genome of the virus itself.

The ӳý team has been sharing the sequence data and insights with our clinical and public health partners in real time, helping them to understand how SARS-CoV-2 has spread in our community. A working draft version of our manuscript describing the details of this data and analysis . We have made the data and analysis workflows publicly available in the ӳý’s platform, a secure, open source cloud environment for storing, analyzing, and sharing (with tight permission control) genomic and other biomedical data. We are working with the Data Sciences Platform to adapt Terra for , to support and accelerate the use of this approach by public health researchers and practitioners around the world so they can respond to COVID-19 and future public health emergencies. The dataset and analysis workflows used here can be found at the . The genomic data are also being shared openly on and are being visualized on .

The sequencing team includes members of the ӳý, the Divisions of Infectious Diseases and Pathology at MGH, and the Massachusetts Department of Public Health. We have received generous support from many sources, including rapid, open, real-time sharing on GISAID and GenBank, helpful conversations and methods through the CDC's SPHERES network, bioinformatic software packages including the tools on nextstrain.org, vibrant collaboration among our research teams, and support from NIAID, CDC, HHMI, Illumina, the Bill and Melinda Gates Foundation, and the Doris Duke Charitable Foundation. We are grateful to the many individuals and groups who have supported this effort.

This is our first cohort of SARS-CoV-2 genomes, and there is more to come. In the coming months, we will continue to prioritize the production and sharing of high-quality genomic data to support clinical and public health operational understanding and decision making during the COVID-19 pandemic. 

* Project team and contributors includes, from MGH: Melis Anahtar, John Branda, Regina LaRocque, Jacob Lemieux, Virginia Pierce, Eric Rosenberg, Ed Ryan, Bennett Shaw, Damien Slater, Sarah Turbett; MA Department of Public Health: Catherine Brown, Timelia Fink, Glen Gallagher, Larry Madoff, Sandra Smole; ӳý Data Sciences Platform: Sushma Chaluvadi, Christine Loreth, Anthony Philippakis, DSP Field Engineering Team; ӳý’s Stanley Center for Psychiatric Research: Felecia Cerrato, Sinéad Chapman, Caroline Cusick, Katelyn Flowers, Anna Neumann; ӳý’s Office of Research Subject Protection: Stacey Donnelly, Andrea Saltzman, Susie Weisenburger; Other helpful colleagues: Maha Fahrat (HMS), William Hanage (HSPH), Tami Lierberman (MIT), Tammy Mason, Angela Page, Aviv Regev; Sabeti Lab Viral Genomics Group: Gordon Adams, Matt Bauer, Amber Carter, Kat Deruff, Katherine Figueroa, Adrianne Gladden-Young, Andreas Gnirke, Alan Guttierez, Molly Kemball, Lydia Krasilnikova, Allison Krunnfusz, Kim Lagerborg, Aaron Lin, Bronwyn MacInnis, Hayden Metsky, Erica Normandin, Danny Park, Steve Reilly, Melissa Rudy, Pardis Sabeti, Stephen Schaffner, Katherine Siddle, Chris Tomkins-Tinch