Deep dive into multi-omics variational autoencoding / Variational autoencoders for analysis and integration of multi-omics and multi-modal data

Ricardo Hernandez Medina
NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Meeting: Deep dive into multi-omics variational autoencoding

Our group has recently shown how variational autoencoders (VAEs), a deep-learning-based generative model, can be leveraged to provide insights into the complex interplay and relationships present in large biological datasets (namely, namely, associations between drugs and omics profiles retrieved from newly diagnosed T2 diabetes patients). In this primer, we will take a deep dive into our MOVE (multi-omics variational autoencoders) pipeline (Allesøe et al., Nature Biotechnology, 2023). We will start by briefly reviewing the steps of data pre-processing and model optimization, which should allow us to generate a model that can compress and integrate multi-modal data (both categorical and continuous variables, such as clinical measurements, microbiome census data, transcriptomics, proteomics, diet and lifestyle records) into meaningful latent space. Next, we will focus on two approaches we followed on the method we devised to determine the associations between omics variables and categorical labels (such as drug intake). After perturbing the original dataset, we inspected our model’s output and identified significant differences between the baseline and perturbed results through two approaches. In one approach, we rely on univariate statistical methods and ensemble modeling, whereas, in another approach, we draw from Bayesian decision theory. Finally, we discuss the outlook of our pipeline and the forthcoming improvements and additions we are working on.

Simon Rasmussen
NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Primer: Variational autoencoders for analysis and integration of multi-omics and multi-modal data

Unsupervised machine learning is a powerful technique for learning patterns in large datasets. In this talk, I will present my group's journey into developing and applying VAEs for analysis of various types of multi-omics data. First, I will describe our work on integrating microbiome derived data for identification and reconstruction of bacterial and viral genomes in metagenomics data (Nissen et al., Nature Biotechnology, 2021; Johansen et al., Nature Communications, 2022). Second, I will present how we used VAEs for data-driven stratification of major depressive disorder (MDD) and schizophrenia (SCZ) for a large cohort of 42,000 individuals integrating genotype and multiple registry data (Allesøe et al., Science Advances, 2022). Finally, I will describe how we integrate patient level multi-omics data, extensive clinical characterization, diet, accelerometry and medication data from a Type 2 Diabetes cohort (Allesøe et al., Nature Biotechnology, 2023). Our framework (MOVE) can integrate these to a meaningful latent representation, is resistant to missing data and able to identify cross modality associations. To achieve this, we used virtual perturbations, similar to gendankenexperiments, of an ensemble of trained models, to estimate the effect of one feature across the omics data. We use this to identify drug-omics associations, compare predicted drug-omics responses, and estimate the overall effect of each drug in across omics data.

For more information visit: /mia.