Uncertainty-aware analysis of RNA-Seq data using a tree-based framework
University of Maryland -- Department of Computer Science and Center for Bioinformatics and Computational Biology
The length of a short read is typically much smaller than that of a spliced transcript, making it difficult to determine the true locus of origin in eukaryotic transcriptomes, especially since transcripts can share overlapping sequences. This ambiguity introduces uncertainty in the abundance estimation of certain transcripts, which in turn affects downstream analyses such as differential expression testing. To address these challenges, we introduce a data-driven tree-based framework that incorporates uncertainty into RNA-seq data analysis.
In the first part of the talk, I will discuss existing approaches for handling uncertainty and their limitations in RNA-seq data analysis before introducing TreeTerminus. TreeTerminus constructs a hierarchical, tree-like structure from a given set of RNA-seq samples, where leaf nodes represent individual transcripts and internal nodes correspond to aggregated transcript groups. As one ascends the tree, uncertainty decreases, providing a flexible framework for analyzing data at different levels of resolution, depending on the analysis of interest.
In the second part of the talk, I will introduce mehenDi, a tree-based differential testing method designed to operate on the tree structures generated by TreeTerminus. mehenDi maximizes the signal that can be extracted from RNA-seq data while explicitly controlling for uncertainty, enabling the discovery of novel features that would be missed by existing gene or transcript-level differential testing methods.