Tagged articles
6 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 11, 2024 · Artificial Intelligence

How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe

Examining OpenAI’s o1 model, this article explores its self‑correction capability by linking test‑time scaling, MCTS‑style reasoning, and DeepMind’s SCoRe reinforcement‑learning framework, illustrating step‑by‑step demos, hypothesizing internal judgment mechanisms, and proposing training pipelines that combine self‑generated data with post‑training RL.

LLM reasoningMCTSOpenAI
0 likes · 12 min read
How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 10, 2024 · Artificial Intelligence

How MCTS Powers Inference in OpenAI’s o1: A Deep Dive with rStar

This article explains how the inference component of OpenAI’s o1 model can be implemented using Monte‑Carlo Tree Search, detailing the action space, rollout process, UCT scoring, and best‑path selection, with a concrete walkthrough of Microsoft’s open‑source rStar code.

InferenceLarge Language ModelsMCTS
0 likes · 26 min read
How MCTS Powers Inference in OpenAI’s o1: A Deep Dive with rStar
Model Perspective
Model Perspective
Jul 31, 2024 · Artificial Intelligence

How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive

Monte Carlo Tree Search (MCTS) is a statistical heuristic algorithm that builds decision trees through selection, expansion, simulation, and backpropagation, enabling breakthroughs like AlphaGo’s victory and finding applications in robotics, autonomous driving, finance, and bioinformatics.

AI applicationsAlphaGoMCTS
0 likes · 7 min read
How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 9, 2024 · Artificial Intelligence

Why Step-Level DPO Is Revolutionizing LLM Math Reasoning

This article reviews recent step‑level DPO research, compares it with instance‑level DPO, explains the underlying Monte Carlo Tree Search formulation, and presents the author’s own replication experiments that demonstrate consistent performance gains across multiple LLM sizes on GSM8K and MATH benchmarks.

AI researchLLM alignmentMCTS
0 likes · 10 min read
Why Step-Level DPO Is Revolutionizing LLM Math Reasoning
Tencent Cloud Developer
Tencent Cloud Developer
Jun 27, 2018 · Artificial Intelligence

Search and Optimization Algorithms in Game AI

Game AI relies on a variety of search techniques—ranging from uninformed breadth‑first and depth‑first methods to heuristic‑driven A*, minimax with alpha‑beta pruning, and Monte Carlo Tree Search—as well as optimization approaches such as hill climbing, simulated annealing, genetic and evolution strategies, multi‑objective evolutionary algorithms, and neuroevolutionary methods like NEAT to generate intelligent, balanced, and adaptable game behavior.

A* algorithmMCTSMiniMax
0 likes · 20 min read
Search and Optimization Algorithms in Game AI
Architect
Architect
Mar 10, 2016 · Artificial Intelligence

Monte Carlo Tree Search (MCTS): Principles, Algorithms, Advantages, and Applications

This article explains Monte Carlo Tree Search (MCTS), covering its origin in AlphaGo, fundamental algorithm steps, node‑selection strategies such as UCB, strengths and weaknesses, enhancements, historical background, and recent research developments in artificial intelligence.

Artificial IntelligenceMCTSMonte Carlo Tree Search
0 likes · 12 min read
Monte Carlo Tree Search (MCTS): Principles, Algorithms, Advantages, and Applications