Artificial Intelligence 6 min read

How BioEmu Generates Protein Conformational Ensembles Faster Than MD

Microsoft Research’s AI for Science team released the open‑source BioEmu model, a generative diffusion architecture that leverages AlphaFold’s Evoformer and extensive MD and stability data to efficiently sample protein conformational ensembles, achieving near‑MD accuracy in free‑energy and mutation stability predictions while dramatically reducing computational cost.

Data Party THU

Aug 17, 2025

How BioEmu Generates Protein Conformational Ensembles Faster Than MD

Background

Proteins exist as ensembles of conformations, and many biological functions depend on transitions between these structures. Traditional predictors such as AlphaFold provide a single static model and cannot capture the full conformational landscape.

BioEmu Model

BioEmu is an open‑source generative deep‑learning model built on the DiG (Distributional Graphormer) framework. It adopts a diffusion‑model architecture, incorporates AlphaFold’s Evoformer encoder, and uses second‑order integration sampling to efficiently draw samples from protein conformational distributions.

Training Data and Efficiency

The training set combines static structures from the AlphaFold Protein Structure Database, more than 200 ms of molecular‑dynamics (MD) simulation trajectories, and roughly 500 k experimental protein‑stability measurements. With this data, BioEmu can generate thousands of distinct protein structures per hour on a single GPU, delivering several orders of magnitude speed‑up compared with conventional MD simulations.

Performance

BioEmu accurately reproduces key structural changes such as hidden pockets, local unfolding, and domain rearrangements. In free‑energy prediction tasks it achieves an error of about 1 kcal/mol, matching millisecond‑scale MD results. For ΔΔG stability predictions of mutants, the model attains a mean absolute error below 1 kcal/mol and a Spearman correlation above 0.6.

Case Studies

Complex Protein II (134 residues) – an intrinsically disordered protein involved in neurotransmitter release. BioEmu efficiently sampled its flexible ensemble and recovered known secondary‑structure elements such as the central and auxiliary helices.

Four‑pass transmembrane protein CD9 (225 residues) – the pretrained model sampled both crystal reference structures (PDB 6rlo and 6rlr). After fine‑tuning with MD data, BioEmu retained only the 6rlo conformation, consistent with experimental evidence that 6rlr cannot exist as a folded monomeric state, and it also predicted open and closed conformations.

Future Directions

Current work focuses on monomeric proteins. Planned extensions include modeling protein complexes, protein‑ligand interactions, and other biologically relevant systems, as well as integrating additional experimental data to improve generalization and interpretability.

Resources

Paper: https://www.science.org/doi/10.1126/science.adv9817

Code repository: https://github.com/microsoft/bioemu

Model checkpoint: https://huggingface.co/microsoft/bioemu

Code example

来源：
ScienceAI
本文
约1200字
，建议阅读
5
分钟
近期，微软研究院 AI for Science 团队提出并开源了一种生成式深度学习模型 ——BioEmu，以前所未有的效率和精度模拟了蛋白质的构象变化，为理解蛋白质功能机制和加速药物发现打开了新路径。

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

diffusion model Generative AI bioinformatics protein modeling AlphaFold molecular dynamics

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.