Tagged articles

LLM alignment

9 articles · Page 1 of 1

Jul 17, 2026 · Artificial Intelligence

Anthropic Unveils Two Groundbreaking LLM Alignment Reports

Anthropic’s July releases present a taxonomy of four new autonomous‑agent failure modes backed by large‑scale red‑team experiments, and introduce GRAM, a modular pre‑training framework that enables fine‑grained capability access control, showing comparable performance to multiple filtered models with far less training cost.

AI safetyAgentic MisalignmentCapability Access Control

0 likes · 14 min read

Anthropic Unveils Two Groundbreaking LLM Alignment Reports

DeepHub IMBA

May 19, 2026 · Artificial Intelligence

A 2026 Survey of LLM‑Focused RL: From PPO to DPO, GRPO, and Multi‑Agent RL

The article reviews five years of LLM‑centric reinforcement learning, tracing the evolution from early Q‑learning to PPO, then to Direct Preference Optimization, Group Relative Policy Optimization, and finally multi‑agent RL, detailing each method’s mechanics, strengths, failure modes, practical considerations, and emerging open‑source toolchains.

DPOGRPOLLM alignment

0 likes · 33 min read

A 2026 Survey of LLM‑Focused RL: From PPO to DPO, GRPO, and Multi‑Agent RL

PaperAgent

May 14, 2026 · Artificial Intelligence

New Paradigm for LLM Alignment: Insights from Two Recent Anthropic Papers

Anthropic's two May papers reveal that simple SFT/RLHF is insufficient for safe LLMs; inserting a model‑spec mid‑training stage and synthetic‑document fine‑tuning dramatically reduces agentic misalignment, improves data efficiency, and enables models to reason about values before acting.

Agentic MisalignmentAnthropicLLM alignment

0 likes · 13 min read

New Paradigm for LLM Alignment: Insights from Two Recent Anthropic Papers

DataFunTalk

Apr 8, 2026 · Artificial Intelligence

Claude Mythos Preview Crushes Benchmarks and Reveals 27‑Year‑Old Zero‑Day

Anthropic's Claude Mythos Preview outperforms GPT‑5.4, Gemini 3.1 Pro and Opus 4.6 across dozens of AI benchmarks, autonomously discovers thousands of software vulnerabilities, exploits them without human guidance, and raises serious alignment and security concerns for the industry.

AI benchmarksAnthropicClaude Mythos

0 likes · 15 min read

Claude Mythos Preview Crushes Benchmarks and Reveals 27‑Year‑Old Zero‑Day

Baobao Algorithm Notes

Aug 14, 2025 · Artificial Intelligence

Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It

The article analyzes the poor generalization of supervised fine‑tuning (SFT) for large language models, reveals its gradient as a high‑variance inverse‑probability policy gradient, proposes a one‑line Dynamic Fine‑Tuning correction, and shows substantial gains on challenging math and offline RL benchmarks.

Dynamic Fine-TuningGeneralizationLLM alignment

0 likes · 7 min read

Why Standard SFT Fails to Generalize and How One‑Line Dynamic Fine‑Tuning Fixes It

Volcano Engine Developer Services

Jun 18, 2025 · Artificial Intelligence

ChatTS: A Synthetic Data‑Driven Multimodal LLM that Natively Understands Time Series

ChatTS is a time‑series‑native multimodal large language model trained on purely synthetic data, offering superior understanding and reasoning over both real and synthetic time‑series datasets, and outperforming existing LLM baselines across alignment and inference tasks.

AILLM alignmentMultimodal LLM

0 likes · 18 min read

ChatTS: A Synthetic Data‑Driven Multimodal LLM that Natively Understands Time Series

Baobao Algorithm Notes

Sep 10, 2024 · Artificial Intelligence

How Direct Preference Optimization Simplifies LLM Alignment Without Reward Models

This article breaks down the mathematical derivation of Direct Preference Optimization (DPO), showing how it replaces the traditional RLHF‑PPO pipeline by directly training an alignment model from human preference data, eliminating the need for a separate reward model and simplifying the overall training process.

DPOLLM alignmentPreference Optimization

0 likes · 17 min read

How Direct Preference Optimization Simplifies LLM Alignment Without Reward Models

Alibaba Cloud Big Data AI Platform

Aug 29, 2024 · Artificial Intelligence

How PAI-ChatLearn Accelerates Large‑Scale LLM Alignment Training

PAI-ChatLearn is an open‑source framework that abstracts and decouples alignment training for large language models, offering flexible resource scheduling, multi‑backend support, and significant speedups—up to 208% for 70B models—while supporting RLHF, DPO, and custom training flows.

AI performanceChatLearnLLM alignment

0 likes · 11 min read

How PAI-ChatLearn Accelerates Large‑Scale LLM Alignment Training

Baobao Algorithm Notes

Jul 9, 2024 · Artificial Intelligence

Why Step-Level DPO Is Revolutionizing LLM Math Reasoning

This article reviews recent step‑level DPO research, compares it with instance‑level DPO, explains the underlying Monte Carlo Tree Search formulation, and presents the author’s own replication experiments that demonstrate consistent performance gains across multiple LLM sizes on GSM8K and MATH benchmarks.

AI researchLLM alignmentMCTS

0 likes · 10 min read

Why Step-Level DPO Is Revolutionizing LLM Math Reasoning