Artificial Intelligence 6 min read

How AlphaGenome Predicts Regulatory DNA Variants with 1‑bp Precision

AlphaGenome is a novel AI system that ingests up to 1 Mb DNA sequences to deliver single‑base‑resolution functional predictions across eleven regulatory modalities, achieving state‑of‑the‑art performance on dozens of benchmark tasks and demonstrating practical insights in cancer‑related and splicing mutation case studies.

PaperAgent

Jan 29, 2026

How AlphaGenome Predicts Regulatory DNA Variants with 1‑bp Precision

Overview

AlphaGenome is an artificial‑intelligence model that accepts DNA sequences up to 1 megabase (≈entire yeast genome) and predicts, at single‑base resolution, how individual variants or mutations affect a broad spectrum of regulatory biological processes.

Key Capabilities

Input: ultra‑long 1 Mb sequences.

Output: 5,930 human and 1,128 mouse functional tracks at single‑base resolution.

Coverage: eleven regulatory modalities, including gene expression, splicing, chromatin accessibility, histone modifications, and 3D contact maps.

Performance: achieves state‑of‑the‑art results on 25 of 26 variant‑effect benchmarks.

Model Architecture

Figure 1a: U‑Net‑Transformer hybrid backbone; the 1 Mb sequence is split into eight parallel segments processed across eight TPUv3 devices.

Key components:

Encoder : four‑stage down‑sampling from 1 bp to 128 bp.

Transformer Tower : models enhancer‑promoter long‑range interactions at 128 bp resolution.

Decoder : up‑samples back to 1 bp with skip‑connections to preserve fine‑grained detail.

2D Pairwise Branch : generates an additional 2 kb‑resolution chromatin contact map.

Training Strategy

Pre‑training : four‑fold cross‑validation with training/validation splits defined by genomic intervals to prevent information leakage.

Distillation : an ensemble of four teacher models is distilled into a single student model; input sequences are randomly mutated and reverse‑complemented to improve robustness.

Benchmark Performance

Splicing variant (ClinVar deep intron) : +3 % auPRC over the best baseline.

Expression QTL (GTEx eQTL direction prediction) : +25.5 % auROC.

Chromatin accessibility (caQTL causal inference) : +8 % average precision.

3D contact map (Micro‑C) : +42 % cell‑specific correlation.

Real‑World Case Study ①: TAL1 Oncogenic Enhancer Mutation

Post‑mutation H3K27ac and H3K4me1 signals increase.

TAL1 expression rises downstream of the mutation (≈7.5 kb).

In‑silico mutagenesis reveals newly created MYB motifs, matching experimental observations.

Real‑World Case Study ②: Splicing Mutation “One Variant, Three Effects”

The same variant can affect:

Splice‑site strength.

Competitive usage of the site.

Specific splice‑junction counts.

AlphaGenome outputs three separate scores; a composite score yields:

GTEx rare splicing abnormal samples: auPRC 0.66.

MFASS experimental validation: performance surpasses SpliceAI and DeltaSplice.

Ablation Experiments

Resolution : training at 1 bp resolution significantly outperforms 32 bp or 128 bp resolution for splicing and accessibility tasks.

Sequence length : training on 1 Mb versus 32 kb improves eQTL sign‑prediction accuracy by +12 %.

Distillation : distilling from 64 teachers yields a student model whose performance matches a 4‑model ensemble while inference is ~4× faster.

Multimodal training : joint training on all functional tracks outperforms single‑modality models, with the largest gains on eQTL tasks.

References

https://deepmind.google/blog/alphagenome-ai-for-better-understanding-the-genome/

Advancing regulatory variant effect prediction with AlphaGenome

https://www.nature.com/articles/s41586-025-10014-0

benchmark AlphaGenome genomics AI U-Net Transformer variant effect prediction

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.