Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.

Genetics in medicine : official journal of the American College of Medical Genetics
Authors
Keywords
Abstract

PURPOSE: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.METHODS: Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients of European ancestry with genetic data in the corresponding biobanks. Crohn's disease was studied based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for training and testing. We investigated the impact of choices regarding PRS integration method, training sample, model complexity, and performance metrics.RESULTS: Overall, our results show that including PRS resulted in higher performance but this gain was only robust in situations with limited clinical information. We find consistent performance increases from more compute-intensive models such as random forest, but the impact of other decisions vary by site.CONCLUSION: This work highlights the importance of considering methodological decision points in interpreting the impact of PRS on prediction performance in clinical models.

Year of Publication
2024
Journal
Genetics in medicine : official journal of the American College of Medical Genetics
Pages
101353
Date Published
12/2024
ISSN
1530-0366
DOI
10.1016/j.gim.2024.101353
PubMed ID
39733260
Links