Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS.

iScience
Authors
Keywords
Abstract

Trusted execution environments (TEEs), such as Intel SGX, enable secure, privacy-preserving computations but may have computational resource constraints. To address this, methods like SkSES use sketching for genome-wide association studies (GWAS) across distributed datasets while maintaining privacy. Here, we present a learning-augmented version of SkSES for more accurate identification of significant SNPs. Our method first conducts GWAS on a public training dataset to locally identify significant SNPs. These SNPs are assigned dedicated memory to enable more precise selection of significant SNPs over the entire dataset while optimizing memory usage. Our method maintains the stringent privacy guarantees of SkSES, ensuring sensitive genotype data remains undisclosed to other institutions or cloud providers. Experimental results on benchmark datasets show the learning-augmented version achieves up to 40% higher accuracy compared to the original SkSES under identical memory constraints. This advancement improves the scalability and effectiveness of collaborative GWAS studies in TEEs.

Year of Publication
2025
Journal
iScience
Volume
28
Issue
3
Pages
112011
Date Published
03/2025
ISSN
2589-0042
DOI
10.1016/j.isci.2025.112011
PubMed ID
40124506
Links