Tagged articles
13 articles
Page 1 of 1
DataFunTalk
DataFunTalk
May 10, 2026 · Artificial Intelligence

DeepSeek vs MCTS: Decoding the ‘Chicken & Liquor’ Dilemma in LLM Training

The article analyzes why DeepSeek’s large‑model training struggles with Monte‑Carlo Tree Search, explains its use of Chain‑of‑Thought prompting, GRPO entropy‑boosting and rejection‑sampling fine‑tuning, compares these methods with Google’s OmegaPRM and PRM approaches, and proposes a concrete MCTS‑driven data‑generation pipeline to overcome the “chicken and liquor” trade‑off.

Chain-of-ThoughtDeepSeekGRPO
0 likes · 14 min read
DeepSeek vs MCTS: Decoding the ‘Chicken & Liquor’ Dilemma in LLM Training
DataFunSummit
DataFunSummit
May 4, 2026 · Artificial Intelligence

DeepSeek’s MCTS Failure: The ‘Roast Chicken and Baijiu’ Dilemma in LLM Training

The article examines why DeepSeek’s large‑model training cannot yet leverage Monte‑Carlo Tree Search, detailing its reliance on SFT, GRPO‑driven CoT activation and rejection‑sampling, contrasting this with Google’s PRM‑based approaches, and proposing a MCTS‑powered data‑generation pipeline to overcome the “roast chicken and baijiu” training dilemma.

Chain-of-ThoughtGRPOLarge Language Models
0 likes · 14 min read
DeepSeek’s MCTS Failure: The ‘Roast Chicken and Baijiu’ Dilemma in LLM Training
AntTech
AntTech
Apr 22, 2026 · Artificial Intelligence

How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents

This article reviews two recent ICLR 2026 papers—M²‑Miner, a multi‑agent Monte‑Carlo Tree Search framework for low‑cost mobile GUI data mining, and IGPO, an information‑gain‑based reinforcement‑learning method that provides dense rewards for multi‑turn search agents—detailing their designs, experiments, and open‑source releases.

GUI Data MiningInformation GainLLM agents
0 likes · 8 min read
How Multi‑Agent MCTS and Information‑Gain Rewards Are Transforming Mobile GUI and Search Agents
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 24, 2024 · Artificial Intelligence

How Marco‑o1 Merges Chain‑of‑Thought Fine‑Tuning with Monte‑Carlo Tree Search for Superior Reasoning

The article introduces Marco‑o1, an open‑source LLM that enhances complex reasoning by fine‑tuning on Chain‑of‑Thought data, integrating Monte‑Carlo Tree Search, introducing mini‑step actions and a reflection mechanism, and evaluates its performance on multilingual math and translation benchmarks.

Artificial IntelligenceChain-of-ThoughtLLM
0 likes · 15 min read
How Marco‑o1 Merges Chain‑of‑Thought Fine‑Tuning with Monte‑Carlo Tree Search for Superior Reasoning
Model Perspective
Model Perspective
Jul 31, 2024 · Artificial Intelligence

How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive

Monte Carlo Tree Search (MCTS) is a statistical heuristic algorithm that builds decision trees through selection, expansion, simulation, and backpropagation, enabling breakthroughs like AlphaGo’s victory and finding applications in robotics, autonomous driving, finance, and bioinformatics.

AI applicationsAlphaGoMCTS
0 likes · 7 min read
How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive
ITPUB
ITPUB
Mar 13, 2024 · Artificial Intelligence

From AlphaGo to ChatGPT: Unraveling the Secrets Behind Modern AI Breakthroughs

This article walks readers through the evolution of artificial intelligence—from early expert systems and machine learning basics to convolutional neural networks, the AlphaGo series, MuZero's rule‑free learning, and the generative power of large language models like ChatGPT—highlighting how deep learning, Monte Carlo tree search, and self‑play collaborate to achieve unprecedented performance across games, science, and language.

AIAlphaGoChatGPT
0 likes · 39 min read
From AlphaGo to ChatGPT: Unraveling the Secrets Behind Modern AI Breakthroughs
Tencent Cloud Developer
Tencent Cloud Developer
Aug 24, 2021 · Artificial Intelligence

Design and Implementation of a High‑Scoring Tetris AI for the Tencent Geek Challenge

Zheng Lin‑kai’s record‑breaking Tetris AI for the Tencent Geek Challenge combines a two‑layer breadth‑first search—stage‑level pruning in Python and fast round‑level BFS in C++—with a heuristic that rewards high score, low occupied cells, and smooth board transitions, enabling a 1,413,876‑point performance.

C++Monte Carlo Tree SearchPython
0 likes · 10 min read
Design and Implementation of a High‑Scoring Tetris AI for the Tencent Geek Challenge
DataFunTalk
DataFunTalk
Oct 18, 2019 · Artificial Intelligence

Reinforcement Learning Based Neural Architecture Search: Methods and Advances

This article reviews reinforcement‑learning‑driven neural architecture search, covering layer‑based, block‑based, and connection‑based strategies, as well as advanced techniques such as inverse reinforcement learning, graph hyper‑networks, Monte‑Carlo tree search, and knowledge‑distillation‑based model compression.

AutoMLMonte Carlo Tree SearchNeural Architecture Search
0 likes · 23 min read
Reinforcement Learning Based Neural Architecture Search: Methods and Advances
Tencent Cloud Developer
Tencent Cloud Developer
Aug 14, 2019 · Artificial Intelligence

From Atari to AI: The Evolution of Video Games and Artificial Intelligence

From Steve Jobs’s early work at Atari to modern DeepMind breakthroughs, the article traces how video games have grown into a multibillion‑dollar industry that serves as a testbed for AI research, while highlighting current AI techniques for smarter agents, procedural content generation, and the collaborative challenges shaping the future of game development.

Artificial IntelligenceGame DevelopmentMonte Carlo Tree Search
0 likes · 25 min read
From Atari to AI: The Evolution of Video Games and Artificial Intelligence
21CTO
21CTO
Mar 13, 2016 · Artificial Intelligence

How AlphaGo’s Four‑Component Architecture Powers Master‑Level Go Play

This article breaks down AlphaGo’s four‑part system—policy network, fast rollout, value network, and Monte Carlo Tree Search—explaining their functions, training methods, and how they combine to achieve professional‑grade Go performance, while comparing them with the DarkForest implementation.

AlphaGoDeep LearningMonte Carlo Tree Search
0 likes · 13 min read
How AlphaGo’s Four‑Component Architecture Powers Master‑Level Go Play
Architect
Architect
Mar 10, 2016 · Artificial Intelligence

Monte Carlo Tree Search (MCTS): Principles, Algorithms, Advantages, and Applications

This article explains Monte Carlo Tree Search (MCTS), covering its origin in AlphaGo, fundamental algorithm steps, node‑selection strategies such as UCB, strengths and weaknesses, enhancements, historical background, and recent research developments in artificial intelligence.

Artificial IntelligenceMCTSMonte Carlo Tree Search
0 likes · 12 min read
Monte Carlo Tree Search (MCTS): Principles, Algorithms, Advantages, and Applications
dbaplus Community
dbaplus Community
Mar 9, 2016 · Artificial Intelligence

How AlphaGo’s Deep Neural Networks Achieve Human‑Level Go Mastery

This article breaks down AlphaGo’s breakthrough architecture—four specialized neural‑network modules, Monte‑Carlo Tree Search, and deep reinforcement learning—to explain how the system moved from imitation learning to self‑improvement and ultimately defeated top human Go players.

AlphaGoDeep LearningGo AI
0 likes · 15 min read
How AlphaGo’s Deep Neural Networks Achieve Human‑Level Go Mastery
Baidu Tech Salon
Baidu Tech Salon
Sep 22, 2014 · Artificial Intelligence

How Baidu’s Bingo AI Cracked the Go Challenge with Novel Algorithms

After decades of being deemed a 'century‑long' AI challenge, Baidu’s Bingo system achieved amateur‑to‑professional level Go play by introducing optimized Monte‑Carlo tree search, a weakened Alpha‑Beta hybrid, and massive supervised learning, demonstrating how breakthroughs in game AI can ripple into broader Baidu products.

Artificial IntelligenceBaiduGo AI
0 likes · 8 min read
How Baidu’s Bingo AI Cracked the Go Challenge with Novel Algorithms