Artificial Intelligence 7 min read

Joint Modeling Predicts New Drugs in Nature: AI Maps a Billion‑Molecule Chemical Frontier

The article presents a joint molecular model that combines property prediction with molecule reconstruction, introduces a "strangeness" metric to detect distribution shift, validates it on 33 datasets, applies it to a 140k virtual screen, and experimentally discovers seven highly novel kinase inhibitors.

SuanNi

May 2, 2026

Joint Modeling Predicts New Drugs in Nature: AI Maps a Billion‑Molecule Chemical Frontier

Finding entirely new drugs is likened to sailing an unknown ocean; artificial intelligence can act as a compass, but once it drifts beyond the familiar training data, prediction models often lose direction.

To give models a sense of their own knowledge boundaries, the researchers propose a "strangeness" metric. They build a Joint Molecular Model (JMM) based on a semi‑supervised auto‑encoder that encodes SMILES strings into compressed latent vectors and decodes them back to the original molecules. While simultaneously predicting molecular bioactivity, the model also reconstructs the input; the reconstruction loss is transformed into the strangeness score, quantifying how far a molecule lies from the training distribution.

The team assembled 33 experimentally annotated datasets covering diverse biological properties. Each dataset was split into training, in‑distribution test, and out‑of‑distribution (OOD) test sets. Baseline models suffered a marked drop in accuracy on OOD data, whereas JMM retained comparable classification performance and assigned markedly higher strangeness scores to OOD molecules. Analysis showed that high strangeness correlates with distribution distance rather than molecular complexity.

To test scalability, JMM was applied to a commercial library of 140 000 compounds whose structural overlap with the training set was minimal. Traditional uncertainty estimates collapsed, producing curves indistinguishable from in‑distribution results, while the strangeness metric sharply rose for the majority of library molecules. High‑strangeness compounds tended to have atypical scaffolds, whereas low‑strangeness compounds preserved classic steroid‑like features. Uncertainty and strangeness proved complementary for reliability assessment.

For experimental validation, the researchers performed blind tests on two clinically relevant kinase targets (PIM1 and CDK1). From 180 000 candidates they selected 60 molecules based on predicted activity, uncertainty, and strangeness, enforcing a similarity ceiling of 38% to the training set. Cell‑based assays at a single 10 µM concentration yielded multiple initial hits; subsequent dose‑response curves confirmed seven compounds with sub‑micromolar inhibition. The hit rates rose to 17% for PIM1 and 7% for CDK1, far exceeding the typical 0.1%–5% range of conventional kinase screens.

In summary, the strangeness metric equips machine‑learning‑driven drug discovery with a precise radar for navigating vast, unexplored chemical space. By integrating prediction and reconstruction, the Joint Molecular Model improves safety and effectiveness when venturing beyond known molecular domains.

Reference: Nature Machine Intelligence (2026) – https://www.nature.com/articles/s42256-026-01216-w

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distribution shift virtual screening AI drug discovery joint molecular model kinase inhibitors strangeness metric

Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.