Artificial Intelligence 12 min read

Deep Research Series: 12 Articles From the Basic Loop to the First Training Review

This article reorganizes a 12‑part Deep Research Agent series into a logical learning path, summarizing each part’s problem, key solutions, and practical takeaways—from building a runnable loop and handling tool failures to data construction, context management, and training evaluation.

Wu Shixiong's Large Model Academy

Jul 3, 2026

Deep Research Series: 12 Articles From the Basic Loop to the First Training Review

Stage One: Get It Running (Articles 1‑4)

This stage solves the problem of creating a functional Deep Research Agent executor that can handle real traffic without involving training, focusing on engineering.

Article 1: What’s the Difference Between This Agent and a Search‑Enabled ChatGPT? Introduces the core distinction between RAG, search‑augmented dialogue products, and Deep Research, then presents a minimal ReAct loop: a while loop where the model decides what to search, the tool returns results, and the model decides the next step. It shows that Deep Research’s skeleton is essentially a loop, but the real gap lies in the detailed implementations covered in later articles.

Article 2: After 17 Steps, What Happens Between Steps 7 and 14? Upgrades the minimal loop to a production‑grade executor. Four core improvements are added: hierarchical tool‑failure handling, embedding‑based duplicate‑search detection (preventing repeated searches between steps 7‑14), three‑tier token budgeting, and structured logging. The logging design is reused throughout the series.

Article 3: When You Grab a Webpage, What Exactly Do You Feed the Model? Implements the four tools—Search, Visit, Scholar, Python—and emphasizes post‑processing to filter out navigation bars, comments, and recommendations, keeping only the main content for the model.

Article 4: A 12% Tool‑Call Failure Rate – Is This a Lab Demo? Discusses failure handling strategies for timeouts, 403 errors, and code‑execution crashes, explaining when to retry, downgrade, or abandon a call. Highlights that simply skipping on error is a harmful pattern because errors propagate downstream.

Stage Two: Keep It Running Without Crashing (Articles 5‑7)

After the executor can run ten steps, new challenges appear: context overflow, variable search result quality, and system architecture for production deployment.

Article 5: After 20 Steps, the Model Can’t Remember the Conclusion from Step 3 Explains that linear ReAct histories break after about 15 steps. The solution is to replace the linear history with an evolving research report that carries forward only the current conclusions, reducing cost and improving stability.

Article 6: 150 Search Results, 8 of Which Are Duplicated Reposts Addresses source‑quality issues. Describes how to detect that multiple results are actually the same article and how to assign credibility scores to domains and content, a crucial step for any research‑oriented agent.

Article 7: Crashing After Fifty Steps – It’s Just a Toy Provides the architectural integration of the executor, tool layer, context manager, and failure‑handling components. Readers who only want to build a system without training can stop here.

Stage Three: Strengthen the Model Through Training (Articles 8‑12)

The first seven articles use off‑the‑shelf models. The next five focus on training a custom Deep Research model, a topic rarely covered in Chinese resources.

Article 8: Where Does Training Data Come From? “Web‑Crawled” Won’t Get You an Offer Details four question‑generation methods—knowledge‑graph random walks, seed‑question iterative upgrades, formal reasoning chains, and entity substitution—plus how to balance their proportions. This forms the foundation for training.

Article 9: Teacher Got It Wrong, Should You Learn That? Describes a three‑stage filtering funnel for teacher‑generated search trajectories, discarding noisy or malformed paths. Only about 30% of the raw trajectories survive, and this filtering often consumes more time than the actual training.

Article 10: From 71% to 94% Accuracy – How Is This Calculated? Covers SFT cold‑start: masking irrelevant observations in loss computation, setting training hyper‑parameters, and building an evaluation pipeline that explains how tool‑call accuracy numbers are derived.

Article 11: One Inference Run, One Failed Exam – Is 22% Error Acceptable? Explores inference‑time extensions such as Best‑of‑N, parallel sampling with a judge model, their cost multipliers, and when they are worthwhile for budget‑sensitive scenarios.

Article 12: Formatting Score Jumped from 4 to Full Marks – Does That Mean Training Succeeded? Reviews the first‑round training run of 187 data points, where formatting scores look perfect but tool‑selection accuracy is poor. Provides a thorough failure analysis, distinguishing superficial metrics from genuine learning.

Three Reading Strategies

Quick Overview: Read Articles 1, 5, and 8 to grasp the overall framework.

Build a Deployable System: Read Articles 1‑7 in order; training material can be deferred.

Interview Preparation: Focus on Articles 2, 4, 9, 10, and 12, which contain the details interviewers love to probe.

Training‑Focused: Read Articles 8‑12 sequentially, with Article 5 for context background.

Regardless of the chosen path, the twelfth article’s failure recap is essential—learning from others’ mistakes is more valuable than additional success stories.

Future Plans for the Series

The series is planned to have 15 articles; three remain:

Stopping criteria – when the agent should cease searching.

Comprehensive reward‑function design.

Cost accounting – detailed breakdown from inference to a full training run.

After completing these, the series will be considered complete. The author notes that community comments have driven deeper content, confirming strong interest in continuing the practical exploration.

Deep Research Agent from 0 to 1 – 12‑article learning map

The key takeaway: the real difficulty of Deep Research lies outside the simple loop; every layer—tool input, context retention, failure handling, data creation, and error analysis—must be engineered carefully.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Tool Integration Inference Optimization Context Management Deep Research ReAct loop LLM Agent Training Data Construction

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.