Tagged articles
2 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 18, 2026 · Artificial Intelligence

Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?

VibeThinker‑3B, a 3‑billion‑parameter language model built on Qwen2.5‑Coder‑3B, achieves scores within the range of 671 B‑parameter models on benchmarks such as LiveCodeBench, AIME26, IMO‑AnswerBench and GPQA, thanks to a two‑stage SFT, multi‑domain reinforcement learning, offline self‑distillation and a claim‑reliability (CLR) evaluator that together push its reasoning ability to the frontier.

VibeThinker-3Bbenchmark performancelarge language models
0 likes · 9 min read
Can a 3B Model Rival Claude Opus 4.5? Benchmark Gaps or Aggressive Post‑Training?
Machine Heart
Machine Heart
Jun 17, 2026 · Artificial Intelligence

Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B

VibeThinker‑3B, a 3‑billion‑parameter Chinese‑built model, achieves programming benchmark scores comparable to top‑tier models like Opus 4.5, excelling in AIME, HMMT, LiveCodeBench and LeetCode contests, thanks to its Spectrum‑to‑Signal training pipeline, Claim‑Level reliability evaluation, and multi‑stage SFT and RL refinements.

AI researchClaim-Level ReliabilitySpectrum-to-Signal
0 likes · 7 min read
Can a 3B Model Rival Opus 4.5 in Programming? Inside the Domestic VibeThinker‑3B