Unsupervised viral antibody escape prediction for future-proof vaccines
Noor Youssef
Debora Marks/ Harvard Medical School/ Systems Biology
Sarah Faye Gurev
Debora Marks Lab (HMS), MIT
Meeting: Unsupervised viral antibody escape prediction for future-proof vaccines
Effective pandemic preparedness relies on predicting immune-evasive viral mutations to ensure early detection of variants of concern and to design future-proofed vaccines and therapeutics. However, current experimental strategies for viral evolution prediction are not available early in a pandemic since they require host polyclonal antibodies. Furthermore, the existing paradigm for vaccine evaluation relies on retrospective evaluation against past variants, instead of proactive evaluation against future viral evolution. To address these limitations, we developed EVEscape, a model which integrated fitness predictions from evolutionary models, structure-based features that assess antibody binding potential, and biochemical distances between mutated and wild-type residues. EVEscape quantifies the viral escape potential of mutations at scale and has the advantage of being applicable before surveillance sequencing, experimental scans, or 3D structures of antibody complexes are available. Using only information available pre-pandemic, EVEscape is as accurate as high-throughput experimental scans at anticipating pandemic variation for SARS-CoV-2 and is generalizable to other viruses. Using EVEscape we forecast future SARS-CoV-2 evolution and present a novel and proactive approach for evaluating and designing vaccines.
Pascal Notin
Primer: Hybrid protein language models for fitness prediction and desig
The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful approaches so far to address these tasks. The performance of these methods is however contingent on the availability of sufficiently deep and diverse alignments for reliable training. Their potential scope is thus limited by the fact many protein families are hard, if not impossible, to align. Large language models trained on massive quantities of non-aligned protein sequences from diverse families seek to address these problems. However, their performance has not yet matched that of their alignment-based counterparts. This talk will introduce hybrid strategies to leverage the strengths from both model classes