Using random matrix theory to model single-cell RNA; topological data analysis


University of Texas at Austin
Using random matrix theory to extract signals from single-cell expression data

Abstract: I'll describe a method for low-rank approximation of a data matrix arising from single-cell RNA sequencing data. Our basic observation is that such data is consistent with a sparse version of the "spike model" studied in random matrix theory, in which a noise matrix has a low-rank signal added in. As a consequence, the contributions from noise to the output of principal components analysis on this data may be characterized in terms of universal distributions and removed. This is joint work with Luis Aparicio, Mykola Bordyuh, and Raul Rabadan.

Abstract: I'll give a gentle introduction to the techniques and core ideas of topological data analysis (TDA), with a focus on the application of these methods to scientific data. I will emphasize the integration of TDA methods with statistics. My goal is to communicate what we can (and cannot) expect TDA to tell us and when TDA is likely to be a meaningful and robust analytic tool.