RL training — 9 Technical Articles

Apr 15, 2026 · Artificial Intelligence

How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony

Relax, an open‑source RL training engine built on Megatron‑LM and SGLang, tackles data heterogeneity, system fragility, and role coupling by using a service‑oriented fault‑tolerant architecture, asynchronous pipelines, and multimodal‑native support, achieving up to 76% end‑to‑end speedup over veRL.

AI infrastructureRL trainingasynchronous pipelines

0 likes · 11 min read

How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony

CodeTrend

Apr 11, 2026 · Artificial Intelligence

Inside Hermes Agent: How Its Closed‑Loop Learning Architecture Transforms AI Assistants

Hermes Agent introduces a closed‑loop learning architecture that adds result evaluation, pattern extraction, and persistent user modeling to the traditional receive‑plan‑execute‑return cycle, offering searchable FTS5‑based memory, autonomous skill creation, multi‑platform messaging, provider‑agnostic model switching, and built‑in research tools for AI developers.

FTS5Hermes AgentLLM summarization

0 likes · 8 min read

Inside Hermes Agent: How Its Closed‑Loop Learning Architecture Transforms AI Assistants

PaperAgent

Mar 5, 2026 · Artificial Intelligence

Bridging Agent Runtime and RL: Inside the Claw‑R1 Training Framework

Claw‑R1, a new reinforcement‑learning framework from the USTC Cognitive Intelligence Lab, integrates the OpenClaw Agent Runtime with RL training to enable agents to learn directly in real environments, addressing the gap between simulated tasks and true tool‑calling, multi‑step reasoning, and stable long‑task execution.

AI infrastructureClaw-R1OpenClaw

0 likes · 10 min read

Bridging Agent Runtime and RL: Inside the Claw‑R1 Training Framework

AI Engineering

Jan 7, 2026 · Artificial Intelligence

Unsloth-MLX: Fine‑Tune LLMs on Mac and Seamlessly Move Code to Cloud GPUs

Unsloth‑MLX leverages Apple’s MLX framework to let Mac users with Apple Silicon fine‑tune large language models locally with a single import change, offering zero‑cost migration to cloud GPUs, supporting SFT, DPO, ORPO, GRPO training, and export to HuggingFace or GGUF formats.

Apple SiliconGPU cloudLLM fine-tuning

0 likes · 4 min read

Unsloth-MLX: Fine‑Tune LLMs on Mac and Seamlessly Move Code to Cloud GPUs

Kuaishou Tech

Dec 19, 2025 · Artificial Intelligence

Why Sampling Noise, Not Train‑Inference Gap, Drives RL Instability in MOE Models

The article reveals that sampling noise, rather than train‑inference inconsistency, is the primary cause of reward collapse during RL training of MOE models, and demonstrates that suppressing this noise stabilizes training and speeds convergence.

AI codingAgentic CodingMOE models

0 likes · 6 min read

Why Sampling Noise, Not Train‑Inference Gap, Drives RL Instability in MOE Models

Baobao Algorithm Notes

Sep 23, 2025 · Artificial Intelligence

How LongCat-Flash-Thinking Sets New SOTA in Open‑Source AI Inference

LongCat-Flash-Thinking, the latest open‑source model from Meituan, introduces domain‑parallel RL training, a high‑throughput DORA infra, and a dual‑path inference framework that together achieve state‑of‑the‑art performance on logical, mathematical, coding, and agentic tasks while maintaining top‑tier speed.

InferenceLongCatRL training

0 likes · 10 min read

How LongCat-Flash-Thinking Sets New SOTA in Open‑Source AI Inference

AntTech

Aug 6, 2025 · Artificial Intelligence

Ring-lite-2507: Boosted Deep Reasoning and Balanced General Capabilities

The AntBailing team releases Ring-lite-2507, enhancing deep reasoning through a Two‑staged RL pipeline while simultaneously balancing overall model abilities, showcasing notable gains on benchmarks like ARC‑AGI‑v1 and offering the model as an open‑source resource across major platforms.

Large Language ModelRL trainingRing-lite

0 likes · 5 min read

Ring-lite-2507: Boosted Deep Reasoning and Balanced General Capabilities

Baobao Algorithm Notes

Jun 3, 2025 · Artificial Intelligence

How to Train a 671B‑Scale Model with RL: Insights from a verl Internship

This article shares a detailed, first‑hand analysis of the technical challenges, framework choices, memory management, weight conversion, precision alignment, and efficiency optimizations encountered while building reinforcement‑learning pipelines for a 671‑billion‑parameter model using the verl ecosystem.

GPU memory managementMegatronModel Parallelism

0 likes · 16 min read

How to Train a 671B‑Scale Model with RL: Insights from a verl Internship

AI Algorithm Path

Apr 20, 2025 · Artificial Intelligence

Boosting Visual Reasoning in VLMs with Reinforcement Learning

The article analyzes how reinforcement learning, which transformed LLM reasoning in DeepSeek, can be applied to visual‑language models to overcome the limitations of traditional chain‑of‑thought prompting and supervised fine‑tuning, presenting concrete reward designs, training pipelines, and a critical assessment of their strengths and weaknesses.

LLMRL trainingchain of thought

0 likes · 10 min read

Boosting Visual Reasoning in VLMs with Reinforcement Learning