Tagged articles

LLM reasoning

7 articles · Page 1 of 1

Jun 29, 2026 · Artificial Intelligence

Mapping LLM Reasoning: Paradigms, Methods, and Failure Modes in a Periodic Table

This 103‑page survey of over 300 recent papers organizes large language model reasoning into a periodic‑table framework, explains where reasoning emerges, categorizes 36 method families across six dimensions, critiques accuracy‑only evaluation, and outlines key open challenges such as fidelity, robustness, calibration, generalization, efficiency, and safety.

AI safetyChain-of-ThoughtEvaluation

0 likes · 13 min read

Mapping LLM Reasoning: Paradigms, Methods, and Failure Modes in a Periodic Table

Machine Heart

May 21, 2026 · Artificial Intelligence

Can Small Models Overthink? TaH Skips 93% Redundant Iterations and Boosts Accuracy

TaH, a selective latent‑iteration method for small language models, identifies and avoids unnecessary token‑level loops, cutting about 93% of extra iterations while delivering a stable 3.0%‑6.8% accuracy gain across nine math, QA, and code benchmarks.

LLM reasoningLooped TransformerTaH

0 likes · 14 min read

Can Small Models Overthink? TaH Skips 93% Redundant Iterations and Boosts Accuracy

Alibaba Cloud Developer

Nov 18, 2025 · Artificial Intelligence

How ReAct and Reflexion Boost Large Language Models for Complex, Real‑World Tasks

The article explains the limitations of large language models on multi‑step reasoning, real‑time information retrieval, and planning, then introduces the ReAct (Reasoning + Acting) framework and its Reflexion extension, detailing their mechanisms, examples, performance gains, practical applications, and future research directions.

LLM reasoningLarge Language ModelsPrompt Engineering

0 likes · 16 min read

How ReAct and Reflexion Boost Large Language Models for Complex, Real‑World Tasks

HyperAI Super Neural

Sep 30, 2025 · Artificial Intelligence

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

The article presents the OnePiece framework, which injects LLM‑style context engineering and latent reasoning into item‑ID based search‑and‑recommendation models, details the design choices, training tricks, attention analysis, and reports online gains of around 1% GMV and ad revenue, offering a thorough technical dissection of generative recommendation in industrial settings.

LLM reasoningOnePieceRecommendation Systems

0 likes · 31 min read

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

Tencent Technical Engineering

Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

DeepSeek-R1GRPOLLM reasoning

0 likes · 21 min read

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

Architect

Feb 6, 2025 · Artificial Intelligence

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

The article reviews DeepSeek‑R1, detailing its reinforcement‑learning‑based training pipeline that uses minimal supervised data, cold‑start fine‑tuning, multi‑stage RL, rejection‑sampling SFT, and distillation to achieve reasoning performance comparable to OpenAI‑o1‑1217, while also discussing successful contributions and failed experiments.

AI researchDeepSeek-R1LLM reasoning

0 likes · 11 min read

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

Baobao Algorithm Notes

Oct 11, 2024 · Artificial Intelligence

How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe

Examining OpenAI’s o1 model, this article explores its self‑correction capability by linking test‑time scaling, MCTS‑style reasoning, and DeepMind’s SCoRe reinforcement‑learning framework, illustrating step‑by‑step demos, hypothesizing internal judgment mechanisms, and proposing training pipelines that combine self‑generated data with post‑training RL.

LLM reasoningMCTSOpenAI

0 likes · 12 min read

How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe