Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.

Am J Hum Genet
Authors
Keywords
Abstract

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Year of Publication
2015
Journal
Am J Hum Genet
Volume
97
Issue
4
Pages
576-92
Date Published
2015 Oct 1
ISSN
1537-6605
DOI
10.1016/j.ajhg.2015.09.001
PubMed ID
26430803
PubMed Central ID
PMC4596916
Links
Grant list
104036 / Wellcome Trust / United Kingdom
K25 HL121295 / HL / NHLBI NIH HHS / United States
P01 CA134294 / CA / NCI NIH HHS / United States
R01 GM105857 / GM / NIGMS NIH HHS / United States
R01 GM105857 / GM / NIGMS NIH HHS / United States
R03 CA173785 / CA / NCI NIH HHS / United States
R03 CA173785 / CA / NCI NIH HHS / United States
R35 CA197449 / CA / NCI NIH HHS / United States
U19 CA148065-01 / CA / NCI NIH HHS / United States