Predicting to explaining bio/Unsupervised domain adaptation
Lauren Erdman
Goldenberg Lab, SickKids Research Institute; Dept of Computer Science, UofT; Vector Institute Primer: Learning biological patterns across domains: investigating and integrating information across data types and sources
In biomedical research, computational models are often used to infer biological knowledge from limited data (e.g. a given tissue, cell line, patient population, etc) with the intention of generalizing findings. In some cases the data can be successfully repurposed to answer a question it was not necessarily collected to answer, while in others, it falls short of its intended purpose. This talk will serve as a primer to Dr. Goldenberg’s discussion of prediction tasks across domains. First I will describe how we elucidate tissue-specific vs tissue-agnostic patterns of regulation using predictive models of gene expression across 21 tissues. From this work, we generate annotations which can be utilized for post-hoc analyses of regulator-phenotype associations. Then I will give a theoretical overview of domain adaptation and its application to the problem of patient drug-response prediction based on cell line data. Here I will focus on the assumptions of domain adaptation models and implications of these assumptions being violated.
Anna Goldenberg
SickKids Research Institute; Dept of Computer Science, UofT; Vector Institute; CIFAR From predicting to explaining biology using machine learning
There is great potential for machine learning to contribute to our understanding of complex human diseases and clinical decision making. Rapidly evolving biotechnologies are making it progressively easier to collect multiple and diverse genome-scale datasets to address clinical and biological questions. As machine learners we have to use our modeling skills responsibly. I will talk about several of our contributions to answering various questions using predictive models. First, I will talk about inferring per-gene regulation for greater than 10,000 genes in 21 cancer tissues with implications for global regulation patterns as well as interpreting biomarkers in specific tissues. I will then share a model we built to predict whether a child with a TP53 mutation is likely to get cancer before the age of 6 using methylation data. Here, biological interpretation of the predictors is less straight-forward due to the genome-wide nature of the predictive signal. Finally, I will talk about deep learning models to predict drug response in cancer cell lines as well as our attempts to translate these findings to patients.