Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion
Researchers from Ant Group and Zhejiang University propose K-ON, a multi-token parallel prediction method that enables large language models to perceive knowledge graph entities through entity-level contrastive learning, achieving superior performance, lower cost, and higher efficiency on KG completion benchmarks.
The rapid development of large language models (LLMs) has broken many barriers in natural language processing, but their token‑wise prediction objective mismatches the multi‑token nature of knowledge‑graph entities. To bridge this gap, Ant Group and Zhejiang University introduce K‑ON, a multi‑token parallel prediction approach that allows LLMs to learn entity‑level representations via contrastive learning.
K‑ON treats knowledge‑graph completion as a textual instruction fed to the LLM. After the Transformer encoder processes the input, a dedicated K‑ON module—comprising multiple MLP heads corresponding to the positions of an entity’s tokens—receives the hidden states. A Conditional Transformer mixes positional information while respecting token order dependencies.
Low‑rank adaptation (LoRA) expands the original LLM scoring layer into K new scoring heads, producing probability distributions for each token position of every candidate entity in parallel. These distributions are then aligned with conventional single‑step token predictions, and a contrastive loss (positive vs. negative entity scores) is applied to embed knowledge‑graph structure into the model.
The training pipeline consists of five steps: (1) format KG completion as a text instruction; (2) feed the encoded representation to K‑ON’s multiple heads; (3) aggregate positional information via Conditional Transformer; (4) use LoRA to generate K parallel scoring layers; (5) extract token‑wise probabilities to score all candidate entities in one pass.
Experimental results on several KG completion datasets show that K‑ON consistently outperforms traditional methods, other LLM‑based approaches, and even multimodal models that use additional image data. Increasing the token count K improves performance up to K≈8, after which gains plateau while model size continues to grow. Inference time remains largely unaffected by K, demonstrating high efficiency.
Further analysis reveals that K‑ON’s entity‑level contrastive learning can handle thousands of negative samples with minimal training overhead; setting around 128 negatives yields optimal results.
Overall, K‑ON provides a more efficient, cost‑effective, and higher‑performing solution for knowledge‑graph completion, enabling LLMs to directly perceive and reason over KG knowledge.
Paper: https://arxiv.org/pdf/2502.06257
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.