Topic modeling the transcriptional spectrum in innate lymphoid cells

 


Regev Lab, ӳý; Kuchroo Lab, HMS
Topic modeling the transcriptional spectrum in innate lymphoid cells

Abstract:  Analyses of immune cell classes, such as innate lymphoid cells (ILCs) or T helper cells, typically treat them as collections of discrete immune cell “types”. Yet, these cell types may share important biological signals and have been observed in some contexts to essentially continuously span a functional spectrum. In single-cell RNA-seq data from skin-resident ILCs, we observed a multi-dimensional spectrum of ILCs that was shifted and functionally reconfigured by induction of psoriasis. To capture and explore these fluid, mixed transcriptional states, we used topic modeling by latent Dirichlet allocation (LDA), a method (covered in the great primer David will give!) designed to analyze the words in a corpus of text documents to discover the themes, or topics, that pervade them. Through an analogy between document analysis and single-cell analysis, we applied LDA to discover each cell’s multiple, non-hierarchical “identities”, and their relative importance, and used these features to analyze cellular plasticity during inflammatory response. Topic weights captured relationships not well described by clusters and, through their functional interpretation, enabled a more nuanced view of similarities among cells. There was no apparent “pseudo-time axis" of progression across steady-state cell states, but a temporal “induction” dimension in our data was revealed when we focused on specific topics related to immune repression or activation. Using experimental techniques in a mouse model, we validated several computational predictions, including the previously undescribed presence of quiescent-like tissue-resident ILCs and differentiation of activated skin-resident ILC2s into pathological ILC3s. Approaches like topic modeling should be valuable in representing other continuous cell states and in uncovering dynamic cellular activation in response to a stimulus.

 

David Benjamin
Data Sciences Platform, ӳý
Primer: Intro to topic models

Abstract:  Starting from a ridiculously simple language model we will build up the prototypical topic model, Latent Dirichlet Allocation (LDA), piece by piece. We will discuss why LDA works and ways to elaborate upon it. Finally, we will survey applications of LDA in biology.