Tagged articles

training pipeline

6 articles · Page 1 of 1

Jun 17, 2026 · Artificial Intelligence

Can a 3B Small Model Match Top Closed‑Source LLMs? VibeThinker-3B’s Limits

VibeThinker-3B, a newly open‑sourced 3‑billion‑parameter model, achieves near‑state‑of‑the‑art scores on math competitions (AIME, IMO‑AnswerBench), coding (LiveCodeBench), and verification benchmarks, rivaling trillion‑parameter closed models, thanks to a Spectrum‑to‑Signal training pipeline, multi‑stage SFT, RL, and offline distillation, supporting a new parametric compression‑coverage hypothesis.

AI researchBenchmarkingParameter Efficiency

0 likes · 8 min read

Can a 3B Small Model Match Top Closed‑Source LLMs? VibeThinker-3B’s Limits

SuanNi

Jun 5, 2026 · Artificial Intelligence

How PaddleOCR‑VL‑1.6’s 0.9B Model Achieved 96.33% SOTA on OmniDocBench v1.6

PaddleOCR‑VL‑1.6, a compact 0.9B visual‑language model, diagnoses three types of weak regions, enriches targeted data, and applies a three‑stage CPT‑SFT‑RL training pipeline to reach a 96.33% overall score on OmniDocBench v1.6, surpassing much larger models across all document‑parsing tasks.

OmniDocBenchPaddleOCR-VL-1.6SOTA

0 likes · 10 min read

How PaddleOCR‑VL‑1.6’s 0.9B Model Achieved 96.33% SOTA on OmniDocBench v1.6

Wu Shixiong's Large Model Academy

Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

LLMScaling Lawsalignment

0 likes · 9 min read

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

DevOps

Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchLarge Language ModelLlama 4

0 likes · 8 min read

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

NewBeeNLP

Mar 20, 2024 · Artificial Intelligence

How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights

This article provides a comprehensive technical walkthrough of Open‑Sora 1.0, covering its Diffusion‑Transformer architecture, three‑stage training strategy, data‑preprocessing scripts, generation quality, and the Colossal‑AI acceleration that together make Sora‑level video synthesis openly reproducible.

AI videoOpen-Soradiffusion transformer

0 likes · 12 min read

How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights

Tencent Architect

Jul 29, 2021 · Artificial Intelligence

Performance Optimization of Advertising Coarse‑Ranking Training on the Light Framework

This article analyzes the bottlenecks of advertising coarse‑ranking training on the Light framework and presents a series of optimizations—including parallel data download, thread‑queue buffering, integer‑to‑string conversion with fmt, and zlib replacement with czlib—that together achieve up to 58% QPS improvement and notable CPU efficiency gains.

AdvertisingCPU/GPU efficiencyPerformance Optimization

0 likes · 11 min read

Performance Optimization of Advertising Coarse‑Ranking Training on the Light Framework