MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

BoltzGen, an all‑atom generative model released by MIT and collaborators, unifies protein folding and binder design with a geometric continuous representation and a flexible design language, training on multimodal datasets and demonstrating nM‑level affinity for 66% of 26 diverse targets including proteins, nanobodies, peptides and small molecules.

HyperAI Super Neural
HyperAI Super Neural
HyperAI Super Neural
MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

Background

De‑novo binder design is a core method for drug discovery, enabling generation of peptide or protein sequences that bind specific targets. Traditional approaches rely on physics‑based molecular dynamics and sequence optimization, which are computationally expensive, limited in design space, and struggle with multimodal targets such as proteins, small molecules, and RNA.

BoltzGen: An All‑Atom Generative Model

BoltzGen replaces discrete residue tags with a geometric continuous representation and jointly trains protein folding and binder design in a single system. A flexible design specification language allows controllable generation across molecular types, including proteins, nanobodies, cyclic peptides, and small molecules.

Mixed Dataset and Multimodal Training Strategy

Training data are drawn from three sources: (1) high‑quality experimental structures from the Protein Data Bank covering RNA, DNA, proteins, and small‑molecule complexes; (2) AlphaFold Database predictions refined by AlphaFold2, providing reliable folding patterns; (3) synthetic complexes generated by the Boltz‑1 model, enriching multimodal scenarios. Antibody and TCR up‑sampled data were removed to preserve diversity. All samples undergo random cropping and multitask processing so that each training iteration may involve folding prediction, binder design, or structure completion.

Model Architecture: From Noise to Structure

The architecture consists of a large Trunk network and a Diffusion Module. The Trunk tokenizes molecular structures and uses a PairFormer with triangle attention and geometric residue encoding to infer residue types and atomic coordinates in continuous space, eliminating dependence on discrete amino‑acid labels. The Diffusion Module receives noisy 3D atomic coordinates, iteratively denoises them using a standard Transformer operating at both atom and token levels, and enforces energy‑based constraints to avoid physical clashes.

Experimental Results: Universal Design Across 26 Targets

Eight independent wet‑lab validation projects tested 26 targets spanning nanobodies, proteins, linear and cyclic peptides. In nine completely unseen targets, designed proteins and nanobodies achieved nM‑level affinity for 66 % of the targets, demonstrating strong generalization. Peptide designs bound diverse targets with affinities from nM to µM and neutralized antibacterial or hemolytic activity; a designed peptide targeting the disordered NPM1 protein co‑localized to nucleoli in cells. For the metabolic enzymes RagC and RagA:RagC dimer, 7 of 29 candidate peptides bound RagC with up to 3.5 µM affinity, and several cyclic disulfide peptides showed stable binding.

BoltzGen also generated protein binders for two biomedical small molecules, achieving detectable binding in the 50–150 µM range without expert chemical guidance. In antibacterial peptide design against DNA gyrase GyrA, over 19 % of candidates reduced bacterial growth four‑fold, with some peptides directly killing host cells.

Benchmark tests on five known‑structure targets (e.g., PD‑L1, TNFα, PDGFR) yielded an 80 % hit rate of nM‑level binders, matching the performance of state‑of‑the‑art models.

Implications and Future Directions

BoltzGen’s unified all‑atom generation framework integrates design, prediction, and validation, offering an open, controllable, and extensible AI infrastructure for drug discovery and biomolecular engineering. The design specification language enables seamless switching among proteins, nanobodies, cyclic peptides, and small molecules, broadening the applicability of generative AI in molecular design.

GitHub repository: https://github.com/HannesStark/boltzgen

Paper: https://go.hyper.ai/3sx2K

diffusion modelGenerative AIMultimodal Trainingprotein binder designBoltzGennM affinity
HyperAI Super Neural
Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.