How the All-Atom Protein Generative Model (APM) Redefines Multi‑Chain Design
The All‑Atom Protein Generative Model (APM) introduced by Hunan University, UCAS and ByteDance Seed combines full‑atom representation, multi‑chain native modeling, and novel flow‑matching techniques to outperform existing SOTA methods on folding, reverse‑folding, antibody and peptide design tasks, backed by a curated multi‑source dataset and extensive benchmarks.
Background
Protein function often depends on multi‑chain complexes, but most AI‑driven protein modeling methods handle only single chains, limiting capture of inter‑chain atomic interactions.
APM Model Overview
APM (All‑Atom Protein Generative Model) is a three‑module framework that directly generates full‑atom multi‑chain protein complexes, supports folding, reverse‑folding, and downstream design tasks, and achieves state‑of‑the‑art performance on several benchmarks.
Seq&BB Module
Generates amino‑acid sequences and backbone coordinates jointly via flow‑matching. Key innovations:
Decoupled noise processes: Separate diffusion for sequence and backbone to preserve bidirectional dependencies.
SE(3) flow matching: Handles translation and rotation of backbone geometry.
Multi‑task learning: Unconditional, conditional, folding, and reverse‑folding tasks are trained together with a loss combining flow‑matching and consistency terms.
Sidechain Module
Predicts side‑chain conformations from the generated sequence and backbone using torsion‑angle representations (up to four rotatable bonds) to balance computational cost and atomic detail.
Two‑stage training: first focuses on side‑chain packing, then refines side‑chains from predicted structures.
Lightweight design with fewer layers and smaller hidden dimensions compared to the Seq&BB module.
Refine Module
Final stage refines the combined output, reducing atomic clashes and improving structural realism by optimizing both sequence and backbone using full‑atom information. It activates only in the late diffusion steps (t ≥ 0.8) when the input quality is sufficient.
Dataset Construction
APM is trained on a curated dataset that merges single‑chain and multi‑chain proteins.
Single‑chain: 187,494 samples from PDB (18,684), Swiss‑Prot (140,769, pLDDT > 85) and AFDB (28,041, pLDDT > 95).
Multi‑chain: 11,620 samples (2–6 chains) from PDB biological assemblies, filtered to exclude antibodies, peptides < 30 residues, and excessively long or unclustered chains.
For chains longer than 384 residues, a random cropping around the interface residues keeps the most relevant 384 amino acids, enabling efficient training without memory overflow.
Both data types are mixed proportionally to improve intra‑chain modeling while providing inter‑chain interaction signals.
Experimental Results
Single‑chain tasks
On the PDB folding benchmark, APM achieves RMSD = 4.83/2.64 and TM‑score = 0.86/0.91, comparable to ESM‑3 and MultiFlow. In reverse‑folding, amino‑acid recovery (AAR) reaches 50.44 %, surpassing ProteinMPNN (46.58 %). For unconditional generation of 100–300‑residue proteins, APM attains scTM = 0.96 (length 100) and scRMSD = 1.80, outperforming ESM‑3 and ProtPardelle.
Multi‑chain tasks
On 2–6‑chain complexes, APM’s folding scores (12.6/13.67) are lower than Boltz‑1 but exceed it when no MSA is provided. Reverse‑folding scTM reaches 0.85/0.95, close to Boltz‑1 with MSA. Generated complexes show strong binding affinity, with ΔG_RAA = ‑112.65/‑116.98 for 50–100‑residue chains, better than Chroma (‑83.96/‑86.66) and the backbone‑only variant APM_BB.
Downstream functional design
Antibody CDR‑H3 design on the RAbD benchmark yields AAR = 41.20 %, RMSD = 2.08 Å, and binding energy ΔG = 91.64, surpassing dyMEAN and DiffAb. Zero‑shot antibody generation, while sequence‑wise divergent, achieves lower ΔG (81.12). For peptide design on PepBench and LNR, APM (SFT) reaches ΔG = ‑19.90, with 69.34 % of samples having ΔG < 0 and DockQ ≥ 0.8 in 11.29 % of cases, outperforming PPFlow and PepGLAD.
References
Paper: https://go.hyper.ai/TVp4i
APM dataset: https://go.hyper.ai/xHwbw
Code example
本文
约
3600字
,建议阅读
7
分钟
湖南大学等提出 APM 模型,全原子生成多链蛋白质,性能超现有 SOTA,入选 ICML 2025。Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
