SPA: effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits.

Nature communications
Authors
Abstract

Sample relatedness is a major confounder in genome-wide association studies (GWAS), potentially leading to inflated type I error rates if not appropriately controlled. A common strategy is to incorporate a random effect related to genetic relatedness matrix (GRM) into regression models. However, this approach is challenging for large-scale GWAS of complex traits, such as longitudinal traits. Here we propose a scalable and accurate analysis framework, SPA, which controls for sample relatedness via a precise approximation of the joint distribution of genotypes. SPA can utilize GRM-free models and thus is applicable to various trait types and statistical methods, including linear mixed models and generalized estimation equations for longitudinal traits. A hybrid strategy incorporating saddlepoint approximation greatly increases the accuracy to analyze low-frequency and rare genetic variants, especially in unbalanced phenotypic distributions. We also introduce SPA to aggregate the results following different models via Cauchy combination test. Extensive simulations and real data analyses demonstrated that SPA maintains well-controlled type I error rates and SPA can serve as a broadly effective method. Applying SPA to 79 longitudinal traits extracted from UK Biobank primary care data, we identified 7,463 genetic loci, making a pioneering attempt to conduct GWAS for these traits as longitudinal traits.

Year of Publication
2025
Journal
Nature communications
Volume
16
Issue
1
Pages
1413
Date Published
02/2025
ISSN
2041-1723
DOI
10.1038/s41467-025-56669-1
PubMed ID
39915470
Links