Artificial Intelligence 8 min read

Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion

Researchers from Ant Group and Zhejiang University propose K-ON, a multi-token parallel prediction method that enables large language models to perceive knowledge graph entities through entity-level contrastive learning, achieving superior performance, lower cost, and higher efficiency on KG completion benchmarks.

AntTech
AntTech
AntTech
Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion

The rapid development of large language models (LLMs) has broken many barriers in natural language processing, but their token‑wise prediction objective mismatches the multi‑token nature of knowledge‑graph entities. To bridge this gap, Ant Group and Zhejiang University introduce K‑ON, a multi‑token parallel prediction approach that allows LLMs to learn entity‑level representations via contrastive learning.

K‑ON treats knowledge‑graph completion as a textual instruction fed to the LLM. After the Transformer encoder processes the input, a dedicated K‑ON module—comprising multiple MLP heads corresponding to the positions of an entity’s tokens—receives the hidden states. A Conditional Transformer mixes positional information while respecting token order dependencies.

Low‑rank adaptation (LoRA) expands the original LLM scoring layer into K new scoring heads, producing probability distributions for each token position of every candidate entity in parallel. These distributions are then aligned with conventional single‑step token predictions, and a contrastive loss (positive vs. negative entity scores) is applied to embed knowledge‑graph structure into the model.

The training pipeline consists of five steps: (1) format KG completion as a text instruction; (2) feed the encoded representation to K‑ON’s multiple heads; (3) aggregate positional information via Conditional Transformer; (4) use LoRA to generate K parallel scoring layers; (5) extract token‑wise probabilities to score all candidate entities in one pass.

Experimental results on several KG completion datasets show that K‑ON consistently outperforms traditional methods, other LLM‑based approaches, and even multimodal models that use additional image data. Increasing the token count K improves performance up to K≈8, after which gains plateau while model size continues to grow. Inference time remains largely unaffected by K, demonstrating high efficiency.

Further analysis reveals that K‑ON’s entity‑level contrastive learning can handle thousands of negative samples with minimal training overhead; setting around 128 negatives yields optimal results.

Overall, K‑ON provides a more efficient, cost‑effective, and higher‑performing solution for knowledge‑graph completion, enabling LLMs to directly perceive and reason over KG knowledge.

Paper: https://arxiv.org/pdf/2502.06257

large language modelAI researchknowledge graphentity contrastive learningK-ONMulti-Token Prediction
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.