Sparsity, Epistasis, and Models of Fitness Functions

David Brookes
Dyno Therapeutics
Leveraging the Sparsity of Epistatic Interactions to Understand and Improve Models of Fitness Functions

The Walsh-Hadamard transform provides a powerful tool to analyze fitness functions in terms of epistatic interactions between amino acids in a sequence. Empirical evidence suggests that many natural fitness functions display substantial sparsity when represented in terms of epistatic interactions. Here we examine several ways this observation may be leveraged to design experiments aimed at probing fitness functions and improve the modeling of such functions from fitness data. First, we explain how to extend the WH transform to larger alphabets with more than two elements using generalized Graph Fourier transforms, which makes possible the analysis of fitness functions of complete nucleotide and amino acid alphabets. Next, we consider how the natural sparsity of fitness functions in the Graph Fourier representation can be used with the Compressed Sensing theory to determine the number of experimental measurements that must be acquired to model a fitness function effectively. Finally, we describe Epistatic Net, a method for regularizing the training of a neural network model of fitness functions that encourages the model to maintain a sparse representation in terms of epistatic interactions. We show that applying this empirically-motivated inductive bias improves the accuracy of fitness models in predicting the fitness of unobserved sequences.

Amirali Aghazadeh
ECE, Georgia Tech
Primer: A Fourier Tour of Protein Function Prediction

 Predicting the biological functions of proteins from their amino acid sequences is one of the long-standing challenges in biology. A comprehensive solution has remained elusive due to the vastness of the combinatorial space of sequences and our limited ability to probe the space experimentally. In this primer, we view protein function prediction from a signal recovery perspective through the lens of the Fourier transform—also known as Walsh-Hadamard (WH) transform for sequence functions. We discuss how WH transform allows us to view protein functions as a multilinear polynomial and in terms of a familiar concept in statistical genetics called epistasis. We demonstrate that an intuitive divide-and-conquer strategy can find the polynomial using a number of samples and times that grows only linearly with the length of the protein sequence. Next, we discuss how we can leverage natural assumptions about the polynomial such as sparsity, to develop efficient protein function prediction algorithms rooted in signal processing and coding theory. 

 

For more information visit: www.broadinstitute.org/mia.