Amplification artifacts in scRNA-seq/Background removal
Mehrtash Babadi
Data Sciences Platform, Ó³»´«Ã½ CellBender droplet-time-machine: reversing mixed-template cDNA amplification artifacts in droplet-based 3' scRNA-seq assays
In the recent years, high-throughput droplet-based single-cell RNA sequencing (scRNA-seq) methods such as 10x Chromium, Drop-seq, and inDrops, have replaced low-throughput plate-based methods such as Smart-seq2 and CEL-Seq2 in many applications. High-throughput methods are significantly less labor-intensive and more cost-effective, allowing us to map the transcriptional landscape of tens of thousands of cells in a single experiment. In addition to the previously known caveats of high-throughput scRNA-seq methods compared to plate-based methods (e.g. reduced mappability and fewer discovered genes), we show that the increased throughput comes at another surprising and paradoxical cost: deeper sequencing of the library while quantifying the gene expression using existing methods can lead to noisier and less reliable outcomes. We trace this nuisance back to the artifacts in mixed-template cDNA amplification, in particular, to the formation of chimeric molecules. We propose a probabilistic method, called "droplet-time-machine," for removing these artifacts and estimating the true pre-PCR cDNA abundance, and present it as a part of the CellBender suite of tools. We use a variety of existing benchmarking datasets to show that RNA quantification using the proposed method leads to significantly increased robustness to variation in sequencing depth and produces results that are comparable to gold standard low-throughput plate-based assays.
Stephen Fleming
Data Sciences Platform, Ó³»´«Ã½ Primer: CellBender remove-background: a deep generative model for unsupervised removal of background noise from scRNA-seq datasets
Droplet-based scRNA-seq assays are known to produce a significant amount of background RNA counts, the hallmark of which is non-zero transcript counts in presumably empty droplets. The presence of background RNA can lead to systematic biases and batch effects in various downstream analyses such as differential expression and marker gene discovery. This talk will take a detailed look at the 10x Genomics droplet-based scRNA-seq protocol, and examine sources of technical background noise in count matrix data. An algorithm, part of the CellBender suite of tools, will be presented for learning the background RNA profile, distinguishing cell-containing droplets from empty ones, and retrieving background-free gene expression profiles. Simulations and investigations of several scRNA-seq datasets will be used to show that processing raw data using "CellBender remove-background" significantly boosts the magnitude and specificity of differential expression across different cell types.