How BioEmu Generates Protein Conformational Ensembles Faster Than MD
Microsoft Research’s AI for Science team released the open‑source BioEmu model, a generative diffusion architecture that leverages AlphaFold’s Evoformer and extensive MD and stability data to efficiently sample protein conformational ensembles, achieving near‑MD accuracy in free‑energy and mutation stability predictions while dramatically reducing computational cost.
Background
Proteins exist as ensembles of conformations, and many biological functions depend on transitions between these structures. Traditional predictors such as AlphaFold provide a single static model and cannot capture the full conformational landscape.
BioEmu Model
BioEmu is an open‑source generative deep‑learning model built on the DiG (Distributional Graphormer) framework. It adopts a diffusion‑model architecture, incorporates AlphaFold’s Evoformer encoder, and uses second‑order integration sampling to efficiently draw samples from protein conformational distributions.
Training Data and Efficiency
The training set combines static structures from the AlphaFold Protein Structure Database, more than 200 ms of molecular‑dynamics (MD) simulation trajectories, and roughly 500 k experimental protein‑stability measurements. With this data, BioEmu can generate thousands of distinct protein structures per hour on a single GPU, delivering several orders of magnitude speed‑up compared with conventional MD simulations.
Performance
BioEmu accurately reproduces key structural changes such as hidden pockets, local unfolding, and domain rearrangements. In free‑energy prediction tasks it achieves an error of about 1 kcal/mol, matching millisecond‑scale MD results. For ΔΔG stability predictions of mutants, the model attains a mean absolute error below 1 kcal/mol and a Spearman correlation above 0.6.
Case Studies
Complex Protein II (134 residues) – an intrinsically disordered protein involved in neurotransmitter release. BioEmu efficiently sampled its flexible ensemble and recovered known secondary‑structure elements such as the central and auxiliary helices.
Four‑pass transmembrane protein CD9 (225 residues) – the pretrained model sampled both crystal reference structures (PDB 6rlo and 6rlr). After fine‑tuning with MD data, BioEmu retained only the 6rlo conformation, consistent with experimental evidence that 6rlr cannot exist as a folded monomeric state, and it also predicted open and closed conformations.
Future Directions
Current work focuses on monomeric proteins. Planned extensions include modeling protein complexes, protein‑ligand interactions, and other biologically relevant systems, as well as integrating additional experimental data to improve generalization and interpretability.
Resources
Paper: https://www.science.org/doi/10.1126/science.adv9817
Code repository: https://github.com/microsoft/bioemu
Model checkpoint: https://huggingface.co/microsoft/bioemu
Code example
来源:
ScienceAI
本文
约1200字
,建议阅读
5
分钟
近期,微软研究院 AI for Science 团队提出并开源了一种生成式深度学习模型 ——BioEmu,以前所未有的效率和精度模拟了蛋白质的构象变化,为理解蛋白质功能机制和加速药物发现打开了新路径。Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
