AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Mar 28, 2026 · Artificial Intelligence

How a 17‑Year‑Old Prompt Turned Claude 3.5 into a Free O1‑Level AI

A teenage prodigy engineered a "Thinking Claude" prompt that adds a human‑like chain‑of‑thought protocol to Claude 3.5, enabling free O1‑level reasoning and producing impressive outputs such as a functional calculator, sci‑fi story, and playable games, while the article details the prompt’s design process and usage.

AI reasoningArtificial IntelligenceClaude 3.5
0 likes · 8 min read
How a 17‑Year‑Old Prompt Turned Claude 3.5 into a Free O1‑Level AI
Architect
Architect
Feb 3, 2025 · Artificial Intelligence

How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1

This article presents DeepSeek‑R1 and DeepSeek‑R1‑Zero, two next‑generation LLMs trained with pure reinforcement learning and multi‑stage fine‑tuning, details their GRPO training framework, model‑distillation pipeline, open‑source release, and evaluation results that rival OpenAI’s o1‑1217 across reasoning, knowledge, and coding benchmarks.

DeepSeekLLM evaluationOpenAI o1
0 likes · 10 min read
How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 10, 2024 · Artificial Intelligence

How MCTS Powers Inference in OpenAI’s o1: A Deep Dive with rStar

This article explains how the inference component of OpenAI’s o1 model can be implemented using Monte‑Carlo Tree Search, detailing the action space, rollout process, UCT scoring, and best‑path selection, with a concrete walkthrough of Microsoft’s open‑source rStar code.

InferenceMCTSOpenAI o1
0 likes · 26 min read
How MCTS Powers Inference in OpenAI’s o1: A Deep Dive with rStar
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 29, 2024 · Artificial Intelligence

Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies

This article analyses the training tricks behind OpenAI's o1 model, explaining test/inference‑time scaling laws, post‑training techniques, process‑supervised reward models (PRM), various inference‑time search methods, data‑collection pipelines, and the trade‑offs between allocating compute to pre‑training versus inference.

LLM inferenceOpenAI o1Reward Model
0 likes · 34 min read
Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies
Architect
Architect
Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyLLMOpenAI o1
0 likes · 43 min read
Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 18, 2024 · Artificial Intelligence

How OpenAI’s o1 Uses Self‑Play RL to Achieve Breakthrough Reasoning

This article provides an in‑depth technical analysis of OpenAI’s new multimodal model o1, explaining its self‑play reinforcement‑learning pipeline, novel train‑time and test‑time scaling laws, inference‑time thinking process, and possible architectural variants, while also discussing broader implications for large‑language‑model research.

OpenAI o1Reward Modelinference thinking
0 likes · 37 min read
How OpenAI’s o1 Uses Self‑Play RL to Achieve Breakthrough Reasoning