Tagged articles
18 articles
Page 1 of 1
Machine Heart
Machine Heart
May 8, 2026 · Artificial Intelligence

How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding

The CHAI framework introduced by CMU and Harvard defines a structured video‑language annotation scheme, scalable human‑AI oversight, and a post‑training pipeline that enables an 8B open‑source model to outperform closed‑source GPT‑5 and Gemini‑3.1‑Pro on professional cinematic techniques.

Multimodal AIQwen3-VLVideo Generation
0 likes · 11 min read
How an 8B Video‑Language Model Beats GPT‑5 and Gemini‑3.1‑Pro at Cinematic Understanding
Architect
Architect
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4: 1M‑Token Context’s Impact on Model, Inference, Cache & Agents

The DeepSeek V4 technical report shows how a 1 million‑token context forces a redesign of attention, KV‑cache, optimizer, quantization and inference budgeting, turning long‑context capability from a costly showcase into a production‑ready feature for agents, search and Chinese professional tasks.

1M contextAttention optimizationDeepSeek
0 likes · 28 min read
DeepSeek V4: 1M‑Token Context’s Impact on Model, Inference, Cache & Agents
AIWalker
AIWalker
Apr 20, 2026 · Artificial Intelligence

How VA‑π Bridges Tokenizers and Autoregressive Generators for Pixel‑Perfect Images

VA‑π introduces a lightweight post‑training framework that uses variational inference and reinforcement learning to align tokenizers with visual autoregressive generators, achieving dramatic quality gains, extreme training efficiency, and robust pixel‑level reconstruction across diverse image generation tasks.

Autoregressive ModelsPixel AlignmentReinforcement Learning
0 likes · 14 min read
How VA‑π Bridges Tokenizers and Autoregressive Generators for Pixel‑Perfect Images
Machine Heart
Machine Heart
Apr 19, 2026 · Artificial Intelligence

World Engine: How Post‑Training Is Launching a New Era of Physical AGI

World Engine introduces a post‑training pipeline that combines high‑fidelity 3DGS simulation, hard‑case mining with diffusion generation, and reinforcement‑learning optimization to give autonomous‑driving models true decision‑making ability, surpassing data‑scaling limits and achieving significant safety gains in both industrial simulations and real‑world tests.

Physical AIReinforcement Learningautonomous driving
0 likes · 11 min read
World Engine: How Post‑Training Is Launching a New Era of Physical AGI
Data Party THU
Data Party THU
Apr 12, 2026 · Artificial Intelligence

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

This article systematically reviews the core post‑training techniques for large language models—including supervised fine‑tuning, RLHF, PPO, GRPO, DPO, RLVR and Agentic RL—explains their evolution, compares their trade‑offs, and highlights the most promising research directions for 2025‑2026.

AI AlignmentGRPOLLM
0 likes · 20 min read
What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 15, 2026 · Artificial Intelligence

Is RL Dead in LLM Post-Training? MIT’s RandOpt Challenges Traditional Methods

The MIT‑CSAIL paper introduces RandOpt, a single‑step, gradient‑free, fully parallel post‑training algorithm that adds Gaussian noise to pretrained LLM weights and ensembles the results, achieving or surpassing PPO/GRPO performance by exploiting dense "neural thickets" that emerge as model scale grows.

LLMRandOptReinforcement Learning
0 likes · 12 min read
Is RL Dead in LLM Post-Training? MIT’s RandOpt Challenges Traditional Methods
PaperAgent
PaperAgent
Jan 8, 2026 · Artificial Intelligence

How SOP Enables Scalable Online Post-Training for Real‑World Robots

The SOP (Scalable Online Post‑training) framework redesigns VLA post‑training from offline, single‑machine, sequential processing to a distributed, parallel online system, allowing robot fleets to continuously learn, share experiences, and scale intelligence while maintaining stability and generalization in complex real‑world environments.

Distributed TrainingOnline LearningRobotics
0 likes · 11 min read
How SOP Enables Scalable Online Post-Training for Real‑World Robots
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 11, 2025 · Artificial Intelligence

Why Redesign the Training Stack? Inside Olmo‑Thinking’s Open‑Source RL Journey

This article provides a detailed technical analysis of the Olmo‑Thinking project, covering why a new open‑source LLM was built, the challenges of reinforcement learning at scale, data‑mix optimization, architectural bottlenecks such as missing GQA and QK‑Norm, and the post‑training techniques used to improve reasoning and long‑context capabilities.

Open-source modelsRLVRdata selection
0 likes · 20 min read
Why Redesign the Training Stack? Inside Olmo‑Thinking’s Open‑Source RL Journey
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 31, 2025 · Artificial Intelligence

Why Post‑Training Matters: Scaling Laws, Fine‑Tuning, and RL Strategies for LLMs

This article explores the importance of post‑training for large language models, explains scaling laws for pre‑ and post‑training, details common fine‑tuning methods (full, PEFT, LoRA), outlines alignment techniques such as RLHF, DPO, PPO, and presents practical workflows using Llama 3 and DeepSeek‑R1, while also discussing test‑time reasoning optimizations.

AlignmentFine-tuningLLM
0 likes · 19 min read
Why Post‑Training Matters: Scaling Laws, Fine‑Tuning, and RL Strategies for LLMs
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 25, 2025 · Artificial Intelligence

Boost Post‑Training Efficiency with Cosmos‑RL, Ray, and VeRL on Alibaba PAI

This article introduces Alibaba Cloud's PAI platform and demonstrates how open‑source reinforcement‑learning frameworks such as Cosmos‑RL, Ray, and VeRL accelerate post‑training for large language models, offering higher throughput, fault‑tolerance, and seamless integration for AI developers.

AI PlatformDistributed TrainingOpen Source Frameworks
0 likes · 9 min read
Boost Post‑Training Efficiency with Cosmos‑RL, Ray, and VeRL on Alibaba PAI
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 21, 2025 · Artificial Intelligence

Unlocking LLM Reasoning: A Deep Dive into Post‑Training Techniques

This article provides a comprehensive technical overview of large language model post‑training, covering fine‑tuning methods (full, parameter‑efficient, LoRA families, prompt tuning), domain‑adaptive tuning, reinforcement‑learning reward modeling, process vs. outcome rewards, inference‑enhancement strategies, dynamic compute allocation, verifier‑augmented reasoning, current challenges, and emerging research directions such as meta‑cognition, physical reasoning, and swarm intelligence.

LLMmeta-cognitionpost-training
0 likes · 21 min read
Unlocking LLM Reasoning: A Deep Dive into Post‑Training Techniques
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 29, 2024 · Artificial Intelligence

Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies

This article analyses the training tricks behind OpenAI's o1 model, explaining test/inference‑time scaling laws, post‑training techniques, process‑supervised reward models (PRM), various inference‑time search methods, data‑collection pipelines, and the trade‑offs between allocating compute to pre‑training versus inference.

LLM inferenceOpenAI o1Reward model
0 likes · 34 min read
Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies