Refining the cis-regulatory grammar learned by sequence-to-activity models by increasing model resolution.

bioRxiv : the preprint server for biology
Authors
Keywords
Abstract

Chromatin accessibility can be measured genome-wide with ATAC-seq, enabling the discovery of regulatory regions that control gene expression and determine cell type. Deep genomic sequence-to-function (S2F) models link underlying genomic sequences to the measured chromatin state and identify motifs that regulate chromatin accessibility. Previously, we developed AI-TAC, a S2F model that predicts chromatin accessibility across 81 immune cell types and identifies sequence patterns that control their differential ATAC-seq signals. While AI-TAC provided valuable insights into the regulatory patterns that govern immune cell differentiation, later research established that ATAC-seq profiles (the distribution of Tn5 cuts) contain additional information about the exact location and strength of TF binding. To make use of this additional information, we developed bpAI-TAC, a multi-task neural network which models ATAC-seq at base-pair resolution across 90 immune cell types. We show that adding ATAC-profile information consistently improves predictions of differential chromatin accessibility. We also demonstrate that simultaneous learning of related cell types through multi-task modeling leads to better predictions than single task models. We then present a systematic framework for comparing how differences in model performance can be attributed to differences in what the model has learned. To understand what additional information bpAI-TAC gleans from ATAC-profiles, we use sequence attributions and identify motifs that have different effect sizes when trained on profiles. We conclude that modeling ATAC-seq at base-pair resolution enables the model to learn a more sensitive representation of the regulatory syntax that drives differences between immunocytes, and therefore will improve predictions of variant effects.

Year of Publication
2025
Journal
bioRxiv : the preprint server for biology
Date Published
01/2025
ISSN
2692-8205
DOI
10.1101/2025.01.24.634804
PubMed ID
39975126
Links