Tagged articles
5 articles
Page 1 of 1
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 18, 2025 · Artificial Intelligence

How ReAct and Reflexion Boost Large Language Models for Complex, Real‑World Tasks

The article explains the limitations of large language models on multi‑step reasoning, real‑time information retrieval, and planning, then introduces the ReAct (Reasoning + Acting) framework and its Reflexion extension, detailing their mechanisms, examples, performance gains, practical applications, and future research directions.

Agentic AILLM ReasoningPrompt engineering
0 likes · 16 min read
How ReAct and Reflexion Boost Large Language Models for Complex, Real‑World Tasks
HyperAI Super Neural
HyperAI Super Neural
Sep 30, 2025 · Artificial Intelligence

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

The article presents the OnePiece framework, which injects LLM‑style context engineering and latent reasoning into item‑ID based search‑and‑recommendation models, details the design choices, training tricks, attention analysis, and reports online gains of around 1% GMV and ad revenue, offering a thorough technical dissection of generative recommendation in industrial settings.

Context EngineeringGenerative RecommendationLLM Reasoning
0 likes · 31 min read
OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation
Tencent Technical Engineering
Tencent Technical Engineering
Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

DeepSeek-R1GRPOLLM Reasoning
0 likes · 21 min read
DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning
Architect
Architect
Feb 6, 2025 · Artificial Intelligence

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

The article reviews DeepSeek‑R1, detailing its reinforcement‑learning‑based training pipeline that uses minimal supervised data, cold‑start fine‑tuning, multi‑stage RL, rejection‑sampling SFT, and distillation to achieve reasoning performance comparable to OpenAI‑o1‑1217, while also discussing successful contributions and failed experiments.

AI researchDeepSeek-R1LLM Reasoning
0 likes · 11 min read
DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 11, 2024 · Artificial Intelligence

How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe

Examining OpenAI’s o1 model, this article explores its self‑correction capability by linking test‑time scaling, MCTS‑style reasoning, and DeepMind’s SCoRe reinforcement‑learning framework, illustrating step‑by‑step demos, hypothesizing internal judgment mechanisms, and proposing training pipelines that combine self‑generated data with post‑training RL.

LLM ReasoningMCTSOpenAI
0 likes · 12 min read
How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe