Ctrl‑DNA: A Constrained RL Framework for Cell‑Specific Gene Expression Selected for NeurIPS 2025

The paper introduces Ctrl‑DNA, a constrained reinforcement‑learning framework that jointly maximizes regulatory activity in target cells while limiting off‑target activity, leveraging a pretrained DNA language model, Lagrangian relaxation, and TF‑binding‑site regularization, and demonstrates superior performance over eight baselines on human promoter and enhancer datasets across six cell types.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
Ctrl‑DNA: A Constrained RL Framework for Cell‑Specific Gene Expression Selected for NeurIPS 2025

Problem Motivation

Precise control of gene expression in specific cell types is essential for gene therapy and synthetic biology. This control relies on cis‑regulatory elements (CREs) such as promoters and enhancers, which act as genetic switches. However, the number of naturally effective CREs is limited, and the combinatorial space of possible DNA sequences (4^100 for 100‑base sequences) makes exhaustive experimental validation infeasible.

Limitations of Existing Methods

Current deep‑learning approaches improve experimental efficiency but face several challenges: (1) mutation‑based or random‑sequence optimization often gets trapped in local optima, reducing diversity; (2) autoregressive language‑model methods can only mimic known sequences and struggle to explore novel cell‑specific CREs; (3) reinforcement‑learning methods improve target‑cell activity but ignore off‑target effects; (4) many designs overlook biological plausibility, producing sequences that miss key transcription‑factor‑binding sites (TFBSs).

Ctrl‑DNA Framework

Ctrl‑DNA addresses these gaps with a constrained reinforcement‑learning (RL) framework that simultaneously optimizes two objectives: maximize CRE activity in the target cell and strictly limit activity in non‑target cells. The framework builds on a pretrained HyenaDNA autoregressive genome‑language model as the initial policy and trains a cell‑type‑specific reward model using the Enformer architecture. Large‑scale MPRA data provide sequence‑fitness pairs for both target and off‑target cells.

Design choices include:

Formulating DNA design as a constrained Markov decision process (CMDP).

Using Constrained Batch‑wise Relative Policy Optimization with Lagrangian relaxation to turn the constrained problem into an unconstrained primal‑dual optimization.

Updating the Lagrange multiplier to penalize sequences that exceed off‑target thresholds.

Computing normalized advantages directly from batch statistics, eliminating the need for a value network.

Combining a clipped surrogate objective with KL‑regularization to keep updates stable and maintain similarity to natural DNA patterns.

Introducing TFBS‑frequency regularization: TFBS vectors are extracted from real high‑specificity CREs with FIMO, and Pearson correlation with generated sequences is rewarded, with the corresponding Lagrange multiplier clipped to [0, λ_max] (λ_max ≤ 1).

Datasets

Evaluation uses real human promoter and enhancer datasets measured by MPRA. The promoter set contains 250‑bp sequences from three leukemia‑derived cell lines (Jurkat, K562, THP1). The enhancer set contains 200‑bp sequences from HepG2 (liver), K562 (red blood), and SK‑N‑SH (neuroblastoma). Notably, THP1 promoters show a right‑skewed activity distribution (25th percentile = 0.49), increasing the difficulty of off‑target constraint.

Model Architecture

Ctrl‑DNA fine‑tunes the HyenaDNA language model as the policy network and trains an Enformer‑based reward model for cell‑type specificity. Rewards for target and off‑target cells are computed from MPRA‑derived fitness scores. The CMDP is solved with the constrained batch‑wise relative policy optimizer, and TFBS regularization is added as an auxiliary reward.

Training hyper‑parameters: Adam optimizer, learning rate 1e‑4, batch size 256, 100 epochs on a single NVIDIA A100 (40 GB).

Experimental Results

Ctrl‑DNA was benchmarked against eight baselines: evolutionary algorithms (AdaLead, Bayesian Optimization, CMA‑ES, PEX), a generative model (RegLM), and RL methods (TACO, PPO, PPO‑Lagrangian). Metrics included cell‑type specificity (target vs. off‑target fitness), biological plausibility (TFBS motif correlation), and sequence diversity.

Key findings:

Ctrl‑DNA achieved the highest target‑cell fitness while satisfying off‑target constraints across all six cell types.

For enhancer design under constraints δ=0.3, 0.5, 0.6, Ctrl‑DNA consistently outperformed baselines, whereas methods like TACO and CMA‑ES achieved high target fitness but failed to suppress off‑target activity.

For promoter design, despite the similarity of the three hematopoietic cell lines, Ctrl‑DNA excelled at δ=0.5 and 0.6. No method could meet the strict δ=0.4 threshold for THP1, but Ctrl‑DNA came closest.

Biological plausibility scores (ΔR) were highest for Ctrl‑DNA on both promoter and enhancer datasets. Motif correlation improved to 0.60 for THP1 promoters under a stringent q‑value < 0.05 filter, surpassing all baselines.

TFBS analysis showed that generated HepG2 sequences enriched for liver‑specific motifs (HNF4A, HNF4G) and K562 sequences enriched for erythroid motifs (GATA1, GATA2).

Sequence diversity scores were comparable to or higher than most baselines, demonstrating that constraint enforcement did not sacrifice diversity.

Ablation studies confirmed the importance of the constrained policy optimizer and the TFBS regularization module.

Broader Impact

The work demonstrates that AI‑driven design of DNA switches can move gene‑therapy and synthetic‑biology applications from trial‑and‑error toward precise, cell‑type‑specific control. It also situates Ctrl‑DNA among recent AI‑for‑genomics efforts such as the Jackson Laboratory’s CODA platform and the RegLM framework, highlighting the growing convergence of deep learning and molecular biology.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI for genomicscell‑type specificitycis‑regulatory element designconstrained reinforcement learningDNA language modelTFBS regularization
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.