Artificial Intelligence 13 min read

How the All-Atom Protein Generative Model (APM) Redefines Multi‑Chain Design

The All‑Atom Protein Generative Model (APM) introduced by Hunan University, UCAS and ByteDance Seed combines full‑atom representation, multi‑chain native modeling, and novel flow‑matching techniques to outperform existing SOTA methods on folding, reverse‑folding, antibody and peptide design tasks, backed by a curated multi‑source dataset and extensive benchmarks.

Data Party THU

Aug 2, 2025

How the All-Atom Protein Generative Model (APM) Redefines Multi‑Chain Design

Background

Protein function often depends on multi‑chain complexes, but most AI‑driven protein modeling methods handle only single chains, limiting capture of inter‑chain atomic interactions.

APM Model Overview

APM (All‑Atom Protein Generative Model) is a three‑module framework that directly generates full‑atom multi‑chain protein complexes, supports folding, reverse‑folding, and downstream design tasks, and achieves state‑of‑the‑art performance on several benchmarks.

Seq&BB Module

Generates amino‑acid sequences and backbone coordinates jointly via flow‑matching. Key innovations:

Decoupled noise processes: Separate diffusion for sequence and backbone to preserve bidirectional dependencies.

SE(3) flow matching: Handles translation and rotation of backbone geometry.

Multi‑task learning: Unconditional, conditional, folding, and reverse‑folding tasks are trained together with a loss combining flow‑matching and consistency terms.

Sidechain Module

Predicts side‑chain conformations from the generated sequence and backbone using torsion‑angle representations (up to four rotatable bonds) to balance computational cost and atomic detail.

Two‑stage training: first focuses on side‑chain packing, then refines side‑chains from predicted structures.

Lightweight design with fewer layers and smaller hidden dimensions compared to the Seq&BB module.

Refine Module

Final stage refines the combined output, reducing atomic clashes and improving structural realism by optimizing both sequence and backbone using full‑atom information. It activates only in the late diffusion steps (t ≥ 0.8) when the input quality is sufficient.

Dataset Construction

APM is trained on a curated dataset that merges single‑chain and multi‑chain proteins.

Single‑chain: 187,494 samples from PDB (18,684), Swiss‑Prot (140,769, pLDDT > 85) and AFDB (28,041, pLDDT > 95).

Multi‑chain: 11,620 samples (2–6 chains) from PDB biological assemblies, filtered to exclude antibodies, peptides < 30 residues, and excessively long or unclustered chains.

For chains longer than 384 residues, a random cropping around the interface residues keeps the most relevant 384 amino acids, enabling efficient training without memory overflow.

Both data types are mixed proportionally to improve intra‑chain modeling while providing inter‑chain interaction signals.

Experimental Results

Single‑chain tasks

On the PDB folding benchmark, APM achieves RMSD = 4.83/2.64 and TM‑score = 0.86/0.91, comparable to ESM‑3 and MultiFlow. In reverse‑folding, amino‑acid recovery (AAR) reaches 50.44 %, surpassing ProteinMPNN (46.58 %). For unconditional generation of 100–300‑residue proteins, APM attains scTM = 0.96 (length 100) and scRMSD = 1.80, outperforming ESM‑3 and ProtPardelle.

Multi‑chain tasks

On 2–6‑chain complexes, APM’s folding scores (12.6/13.67) are lower than Boltz‑1 but exceed it when no MSA is provided. Reverse‑folding scTM reaches 0.85/0.95, close to Boltz‑1 with MSA. Generated complexes show strong binding affinity, with ΔG_RAA = ‑112.65/‑116.98 for 50–100‑residue chains, better than Chroma (‑83.96/‑86.66) and the backbone‑only variant APM_BB.

Downstream functional design

Antibody CDR‑H3 design on the RAbD benchmark yields AAR = 41.20 %, RMSD = 2.08 Å, and binding energy ΔG = 91.64, surpassing dyMEAN and DiffAb. Zero‑shot antibody generation, while sequence‑wise divergent, achieves lower ΔG (81.12). For peptide design on PepBench and LNR, APM (SFT) reaches ΔG = ‑19.90, with 69.34 % of samples having ΔG < 0 and DockQ ≥ 0.8 in 11.29 % of cases, outperforming PPFlow and PepGLAD.

References

Paper: https://go.hyper.ai/TVp4i

APM dataset: https://go.hyper.ai/xHwbw

Code example

本文
约
3600字
，建议阅读
7
分钟
湖南大学等提出 APM 模型，全原子生成多链蛋白质，性能超现有 SOTA，入选 ICML 2025。

bioinformatics all‑atom model multi‑chain design protein generation Structural Biology

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.