Bi-TEAM Raises Hemolysis Prediction Accuracy 350% with Unified Biological‑Semantic and Chemical‑Precision Framework
Bi-TEAM, a cross‑scale representation learning framework that injects local chemical variations into a global protein context, outperforms state‑of‑the‑art baselines on ten biochemical datasets, achieving a 350% boost in hemolysis prediction accuracy and a 66% MCC increase under strict scaffold splits.
In biochemical and molecular‑engineering research, representation learning has become essential for deciphering molecular function and accelerating therapeutic molecule discovery. The quality of embedded features determines the performance ceiling of peptide property prediction and de‑novo design tasks.
Current peptide modeling follows two technical paths. Protein language models (e.g., ESM, ProtT5) capture biological context and evolutionary information through large‑scale sequence pre‑training, while chemical language models capture atom‑level details via atomic‑level tokenization to address non‑canonical amino‑acid modifications. Both approaches have inherent limitations: protein models cannot handle non‑canonical residues without bias, and chemical models ignore global biological context and struggle with long sequences.
To overcome these issues, researchers from The Chinese University of Hong Kong, Macau University of Science and Technology, Zhejiang University, Central South University Xiangya Second Hospital, and University of Electronic Science and Technology of China propose a selective‑fusion modeling paradigm called Bi-TEAM . The framework is built on the insight that “chemical variation is a local perturbation of the biological semantic space.” It adaptively injects chemical signals into a global protein background.
The architecture consists of two complementary streams. The biological sequence stream maps modified amino acids to the most similar natural residues, preserving evolutionary semantics without expanding the token vocabulary. The SELFIES‑like representation stream encodes atomic‑level modifications, providing stable structural information for the chemical language model. After dual‑stream encoding, a position‑aware, gated residual mechanism merges the streams, using the biological representation as the semantic backbone while selectively injecting key chemical cues.
Training follows a two‑stage “pre‑train‑fine‑tune” strategy. First, each encoder is domain‑adapted on large corpora of natural protein sequences and small‑molecule chemical data. Then, multitask joint fine‑tuning teaches the model to fuse biological and chemical features across ten diverse datasets spanning three biochemical domains (modified peptides, PTMs, and natural proteins).
Key evaluation results include:
On a strict scaffold‑similarity split, Matthews correlation coefficient (MCC) improves by up to 66% over the best baselines.
Hemolysis prediction accuracy increases by 350% compared with prior models.
Across ten datasets and seven core prediction tasks, Bi-TEAM achieves state‑of‑the‑art performance.
For cell‑penetrating peptide (CPP) prediction, Bi-TEAM outperforms SeqVec, ESM2, ProtT5 and other embeddings, raising ACC by 5.52%, BACC by 5.88%, Sn by 12.58%, Sp by 1.45%, MCC by 14.68% and AUC by 8.45%.
In a downstream design experiment targeting non‑invasive treatment of neovascular age‑related macular degeneration (nAMD), Bi-TEAM guides the generation of cyclic peptides. Using BoltzDesign1 as a baseline, 1,000 candidate peptides are generated under two conditions: without guidance and with Bi-TEAM guidance. Success rates (log‑odds > 0.5) rise from 6.7% to 30.7%, a 4.6‑fold increase, while the average pLDDT of peptide‑Aflibercept complexes remains above 0.82, indicating maintained structural confidence.
Residue‑pattern analysis reveals that Bi‑TEAM‑guided sequences significantly enrich the hydrophobic triad (W, F, Y) and two positively charged residues (R, K), matching known CPP motifs. Length distribution analysis confirms that this enrichment is independent of peptide length (10–20 residues).
Structural validation using AlphaFold3 predicts high‑confidence complexes between the designed cyclic peptides and Aflibercept, identifying a hydrophobic cavity and a β‑sheet pocket as potential binding sites.
Overall, Bi‑TEAM provides a unified cross‑scale representation learning framework that effectively merges evolutionary biological semantics with fine‑grained chemical precision, enabling superior prediction and generative design of chemically modified biomolecules.
HyperAI Super Neural
Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
