Tag

Multi-Token Prediction

0 views collected around this technical thread.

AntTech
AntTech
Feb 27, 2025 · Artificial Intelligence

Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion

Researchers from Ant Group and Zhejiang University propose K-ON, a multi-token parallel prediction method that enables large language models to perceive knowledge graph entities through entity-level contrastive learning, achieving superior performance, lower cost, and higher efficiency on KG completion benchmarks.

AI researchK-ONMulti-Token Prediction
0 likes · 8 min read
Entity Contrastive Learning via Multi-Token Parallel Prediction for Knowledge Graph Completion
IT Architects Alliance
IT Architects Alliance
Feb 15, 2025 · Artificial Intelligence

DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis

The article provides an in‑depth overview of DeepSeek's transformer‑based foundation, Mixture‑of‑Experts architecture, novel attention mechanisms, multi‑token prediction, FP8 mixed‑precision training, knowledge distillation, reinforcement‑learning approaches, and compares its performance and cost advantages against leading models such as GPT and Gemini.

AI Model ArchitectureDeepSeekFP8 Training
0 likes · 29 min read
DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis