Artificial Intelligence 15 min read

Bi-TEAM Raises Hemolysis Prediction Accuracy 350% with Unified Biological‑Semantic and Chemical‑Precision Framework

Bi-TEAM, a cross‑scale representation learning framework that injects local chemical variations into a global protein context, outperforms state‑of‑the‑art baselines on ten biochemical datasets, achieving a 350% boost in hemolysis prediction accuracy and a 66% MCC increase under strict scaffold splits.

HyperAI Super Neural

Mar 11, 2026

Bi-TEAM Raises Hemolysis Prediction Accuracy 350% with Unified Biological‑Semantic and Chemical‑Precision Framework

In biochemical and molecular‑engineering research, representation learning has become essential for deciphering molecular function and accelerating therapeutic molecule discovery. The quality of embedded features determines the performance ceiling of peptide property prediction and de‑novo design tasks.

Current peptide modeling follows two technical paths. Protein language models (e.g., ESM, ProtT5) capture biological context and evolutionary information through large‑scale sequence pre‑training, while chemical language models capture atom‑level details via atomic‑level tokenization to address non‑canonical amino‑acid modifications. Both approaches have inherent limitations: protein models cannot handle non‑canonical residues without bias, and chemical models ignore global biological context and struggle with long sequences.

To overcome these issues, researchers from The Chinese University of Hong Kong, Macau University of Science and Technology, Zhejiang University, Central South University Xiangya Second Hospital, and University of Electronic Science and Technology of China propose a selective‑fusion modeling paradigm called Bi-TEAM . The framework is built on the insight that “chemical variation is a local perturbation of the biological semantic space.” It adaptively injects chemical signals into a global protein background.

The architecture consists of two complementary streams. The biological sequence stream maps modified amino acids to the most similar natural residues, preserving evolutionary semantics without expanding the token vocabulary. The SELFIES‑like representation stream encodes atomic‑level modifications, providing stable structural information for the chemical language model. After dual‑stream encoding, a position‑aware, gated residual mechanism merges the streams, using the biological representation as the semantic backbone while selectively injecting key chemical cues.

Training follows a two‑stage “pre‑train‑fine‑tune” strategy. First, each encoder is domain‑adapted on large corpora of natural protein sequences and small‑molecule chemical data. Then, multitask joint fine‑tuning teaches the model to fuse biological and chemical features across ten diverse datasets spanning three biochemical domains (modified peptides, PTMs, and natural proteins).

Key evaluation results include:

On a strict scaffold‑similarity split, Matthews correlation coefficient (MCC) improves by up to 66% over the best baselines.

Hemolysis prediction accuracy increases by 350% compared with prior models.

Across ten datasets and seven core prediction tasks, Bi-TEAM achieves state‑of‑the‑art performance.

For cell‑penetrating peptide (CPP) prediction, Bi-TEAM outperforms SeqVec, ESM2, ProtT5 and other embeddings, raising ACC by 5.52%, BACC by 5.88%, Sn by 12.58%, Sp by 1.45%, MCC by 14.68% and AUC by 8.45%.

In a downstream design experiment targeting non‑invasive treatment of neovascular age‑related macular degeneration (nAMD), Bi-TEAM guides the generation of cyclic peptides. Using BoltzDesign1 as a baseline, 1,000 candidate peptides are generated under two conditions: without guidance and with Bi-TEAM guidance. Success rates (log‑odds > 0.5) rise from 6.7% to 30.7%, a 4.6‑fold increase, while the average pLDDT of peptide‑Aflibercept complexes remains above 0.82, indicating maintained structural confidence.

Residue‑pattern analysis reveals that Bi‑TEAM‑guided sequences significantly enrich the hydrophobic triad (W, F, Y) and two positively charged residues (R, K), matching known CPP motifs. Length distribution analysis confirms that this enrichment is independent of peptide length (10–20 residues).

Structural validation using AlphaFold3 predicts high‑confidence complexes between the designed cyclic peptides and Aflibercept, identifying a hydrophobic cavity and a β‑sheet pocket as potential binding sites.

Overall, Bi‑TEAM provides a unified cross‑scale representation learning framework that effectively merges evolutionary biological semantics with fine‑grained chemical precision, enabling superior prediction and generative design of chemically modified biomolecules.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

representation learning Bi-TEAM chemical modification hemolysis prediction peptide design protein language model

Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.