Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?

VibeThinker‑3B, a 3‑billion‑parameter language model built on Qwen2.5‑Coder‑3B, achieves scores within the range of 671 B‑parameter models on benchmarks such as LiveCodeBench, AIME26, IMO‑AnswerBench and GPQA, thanks to a two‑stage SFT, multi‑domain reinforcement learning, offline self‑distillation and a claim‑reliability (CLR) evaluator that together push its reasoning ability to the frontier.

VibeThinker-3Bbenchmark performancelarge language models

0 likes · 9 min read

Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?

Machine Heart

Jun 17, 2026 · Artificial Intelligence

Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B

VibeThinker‑3B, a 3‑billion‑parameter Chinese‑built model, achieves programming benchmark scores comparable to top‑tier models like Opus 4.5, excelling in AIME, HMMT, LiveCodeBench and LeetCode contests, thanks to its Spectrum‑to‑Signal training pipeline, Claim‑Level reliability evaluation, and multi‑stage SFT and RL refinements.

AI researchClaim-Level ReliabilitySpectrum-to-Signal

0 likes · 7 min read

Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B

Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?

Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B

Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?

Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B