EvoAI enables extreme compression and reconstruction of the protein sequence space.
Authors | |
Abstract | Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here we establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 10. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution. |
Year of Publication | 2024
|
Journal | Nature methods
|
Date Published | 11/2024
|
ISSN | 1548-7105
|
DOI | 10.1038/s41592-024-02504-2
|
PubMed ID | 39528677
|
Links |