A Bayesian method for detecting pairwise associations in compositional data.

PLoS Comput Biol
Authors
Keywords
Abstract

Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats.

Year of Publication
2017
Journal
PLoS Comput Biol
Volume
13
Issue
11
Pages
e1005852
Date Published
2017 Nov
ISSN
1553-7358
DOI
10.1371/journal.pcbi.1005852
PubMed ID
29140991
PubMed Central ID
PMC5706738
Links
Grant list
T32 GM074897 / GM / NIGMS NIH HHS / United States
U54 DK102557 / DK / NIDDK NIH HHS / United States