Machine Learning Algorithms & Natural Language Processing
Jun 18, 2026 · Artificial Intelligence
Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?
VibeThinker‑3B, a 3‑billion‑parameter language model built on Qwen2.5‑Coder‑3B, achieves scores within the range of 671 B‑parameter models on benchmarks such as LiveCodeBench, AIME26, IMO‑AnswerBench and GPQA, thanks to a two‑stage SFT, multi‑domain reinforcement learning, offline self‑distillation and a claim‑reliability (CLR) evaluator that together push its reasoning ability to the frontier.
VibeThinker-3Bbenchmark performancelarge language models
0 likes · 9 min read
