Artificial variables help to avoid over-clustering in single-cell RNA sequencing.
Authors | |
Keywords | |
Abstract | Standard single-cell RNA sequencing (scRNA-seq) pipelines nearly always include unsupervised clustering as a key step in identifying biologically distinct cell types. A follow-up step in these pipelines is to test for differential expression between the identified clusters. When algorithms over-cluster, downstream analyses can produce misleading results. In this work, we present "recall" (calibrated clustering with artificial variables), a method for protecting against over-clustering by controlling for the impact of reusing the same data twice when performing differential expression analysis, commonly known as "double dipping." Importantly, our approach can be applied to a wide range of clustering algorithms. Using real and simulated data, we show that recall provides state-of-the-art clustering performance and can rapidly analyze large-scale scRNA-seq studies, even on a personal laptop. |
Year of Publication | 2025
|
Journal | American journal of human genetics
|
Date Published | 03/2025
|
ISSN | 1537-6605
|
DOI | 10.1016/j.ajhg.2025.02.014
|
PubMed ID | 40081375
|
Links |