EvoAI enables extreme compression and reconstruction of the protein sequence space.

Nature methods
Authors
Abstract

Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here we establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 10. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.

Year of Publication
2024
Journal
Nature methods
Date Published
11/2024
ISSN
1548-7105
DOI
10.1038/s41592-024-02504-2
PubMed ID
39528677
Links