RenderFlow: Agentic Code Delivery for Baidu’s Vertical Search Rendering Service

The article presents RenderFlow, a system that integrates LLM‑generated code into Baidu’s search result rendering pipeline by building a generate‑execute‑feedback‑repair‑publish loop, detailing its architecture, multi‑round repair mechanism, quality safeguards, and the resulting reduction of delivery cycles from days to minutes across nearly a thousand scenarios.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
RenderFlow: Agentic Code Delivery for Baidu’s Vertical Search Rendering Service

Background and Current Situation

In the traditional workflow, a search result rendering request passes through requirement review, development, testing, and deployment, involving multiple roles and incurring long cycles and high collaboration costs. As result types and traffic grow, this manual pipeline shows three major challenges:

High logic production and verification cost – adding a new category takes on average four days, complex industry data integration can take a week, and scaling to hundreds or thousands of scenarios would linearly increase effort.

Long activation path for a single scenario – logic changes require a full service‑level change, making rapid verification and low‑cost rollback difficult.

Insufficient generalization and extensibility – existing solutions handle fixed scenarios but need repetitive adaptation for new categories.

From LLM Generation to Agentic Delivery Loop

The introduction of large language models (LLMs) targets the most labor‑intensive steps: logic production, verification, and iterative refinement. RenderFlow constructs a closed‑loop delivery process consisting of Prompt Adapter, Executable Engine, Multi‑Round Repair, and Publishing Governance, allowing LLM‑generated code to be validated, iterated, and released within a single pipeline.

Design and Implementation of RenderFlow

Overall Architecture and Execution Flow

RenderFlow decomposes code production, execution verification, feedback repair, and publishing governance into inter‑connected modules. The architecture is visualized as four layers – Input Preparation, Generation & Repair, Executable Engine, Result Output – with an independent publishing & governance lane on the right.

Input Preparation: Each business scenario has a dedicated adapter that encapsulates a Prompt template, configuration format, and test strategy. Users fill a dynamic form; the system selects the appropriate adapter and assembles a Prompt containing requirement description, API/field specs, target schema, output constraints, and test examples.

Generation & Repair: A Coder generates conversion code from the Prompt; the code is sent to the Executable Engine for validation. If execution fails, the error is structured as feedback, a Reviewer creates repair constraints, and these constraints are injected into the next generation round, forming a "generate → execute → feedback → regenerate" loop.

Executable Engine: The engine decouples business logic from service code by storing generated code as configuration and interpreting it at runtime using the Yaegi Go interpreter. The same engine is reused for preview execution, automated testing, and online execution, ensuring consistent behavior across environments.

Publishing & Governance: After code generation, the system runs static analysis, preview execution, and automated regression tests. Publishing proceeds through staged rollout with health‑check gating (SLA, panic rate, CPU/memory, core‑scenario functionality). Post‑deployment, minute‑level effect checks and daily full‑scenario scans monitor data consistency and stability, with version snapshots enabling rapid rollback.

Result Output: The execution service dynamically loads the conversion code, transforms raw data into a structured template consumable by the frontend, completing the end‑to‑end delivery loop from scenario configuration to online rendering.

Executable Engine

The engine uses Yaegi as a Go code interpreter, allowing each logic update to be a configuration change rather than a full service redeployment. This reduces the update latency to minutes and fully decouples from the service release cycle.

Execution Flow: Upon request, the engine routes based on scenario ID, loads the corresponding conversion code from the configuration system, creates an isolated interpreter instance, loads standard library and registered symbols, injects the code, calls the agreed entry point, and returns structured template data.

Isolation & Fault Tolerance: Each request runs in its own interpreter instance with no shared state, preventing a single scenario’s exception from affecting others. The engine converts runtime errors into structured error results rather than triggering service‑level failures. When multiple conversion logics are associated with a scenario, the engine executes them concurrently and returns the most complete result.

Performance & Stability: Dynamic interpretation adds overhead in configuration fetching, interpreter initialization, symbol loading, and code evaluation. RenderFlow mitigates this by caching configurations, cold‑start optimization, pre‑warming high‑frequency configs, limiting interpreter scope, and avoiding complex calculations or stateful logic in the interpreted path. If the configuration store is unavailable, the engine falls back to the last known code version to maintain service continuity.

Quality Assurance

RenderFlow embeds quality safeguards at three stages:

Pre‑publish interception: Static analysis filters structural risks (out‑of‑bounds, nil pointers), preview execution validates runtime behavior with sample data, and automated regression testing deploys the pending configuration in a mirror environment to compare new vs. old versions and benchmark performance. Any failure blocks the publish request.

Publish‑time interception: During staged rollout, each data center runs health checks (SLA, panic rate, CPU/memory, core‑scenario functionality). Detected anomalies abort further rollout. For conversion code changes, the system selects relevant test cases based on the affected scenarios, avoiding costly full‑scale regression.

Post‑publish monitoring: Minute‑level effect checks run on core scenarios, daily checks cover all scenarios across data centers, verifying non‑null fields and data consistency. Stability monitoring and traffic anomaly alerts run continuously; issues trigger rapid rollback using version snapshots.

Multi‑Round Repair Mechanism

Because a single LLM generation can produce runtime errors (e.g., index out‑of‑bounds, nil pointer) or incomplete scenario understanding (missing fields, incorrect output structure), RenderFlow introduces an independent multi‑round repair loop. The first‑round acceptance rate is 82%; the remaining errors are handled automatically to keep manual intervention below 5%.

The loop involves three roles:

Coder Prompt: Includes requirement & context, historical repair suggestions, current code, and an instruction to regenerate complete executable code.

Reviewer Prompt: Receives execution feedback (syntax errors, runtime exceptions, logs, diff) and is instructed to output only repair constraints, not code.

Memory: Stores accumulated repair constraints for the current task and aggregates common patterns across tasks (e.g., “check slice length before access”). Constraints are de‑duplicated and merged to prevent context bloat.

Coder Prompt:</code><code> -需求与上下文:{{需求描述 / 字段说明 / 目标 Schema}}</code><code> -历史修复建议:{{constraint_1..constraint_n}}</code><code> -当前代码:{{current_code}}</code><code> -输出要求:重新生成完整可执行代码</code><code></code><code>Reviewer Prompt:</code><code> -执行反馈:{{语法错误 / 运行时异常 / 日志 / 结果 Diff}}</code><code> -输出要求:只生成修复约束constraint,不输出代码

The design avoids local patches by having the Reviewer emit only constraints, ensures constraints only accumulate (never removed) to prevent regression, and records reusable fixes in Memory for future scenarios.

If the LLM service or Reviewer fails, the system degrades to a rule engine that matches error types to predefined repair hints, avoiding a single point of failure.

Deployment Results and Practical Summary

Impact

Delivery Cycle: Data‑transformation logic for a single scenario shrank from days/weeks to minutes; the end‑to‑end flow from input to publish can be completed within 30 minutes.

Code Quality: After introducing multi‑round repair, manual modifications dropped below 5%; most issues converge within 2–3 rounds. The quality guardrail further reduces release risk.

Service Capability: The executable engine runs stably in Baidu’s online search scenarios, proving the feasibility of dynamic interpretation for result rendering.

Coverage Scale: The system now supports nearly a thousand scenarios across multiple business lines, handling everything from structured data extraction to complex semantic mapping.

Practical Reflections

RenderFlow is not intended as a universal code‑agent for any software task; it focuses on the well‑bounded domain of search result rendering, where the problem of “how to reliably generate, verify, and deploy logic” is most acute. Compared with generic agents like Devin or Copilot Workspace, RenderFlow emphasizes a single, deterministic pipeline with scenario adapters, a dynamic execution engine, and a rigorous quality assurance framework.

Limitations include occasional manual intervention for business‑rule‑driven field meanings or long‑tail data structures, and the performance ceiling of interpreted execution, which suits clear input‑output boundaries but not heavy computation or long‑chain external calls. Scaling the repair memory also requires ongoing governance (deduplication, aggregation) to keep context size manageable.

Conclusion

The core lessons are:

Dynamic execution engines provide a flexible base for LLM‑generated logic, enabling rapid configuration‑driven updates.

Comprehensive quality‑guardrails (static analysis, preview, regression, staged rollout, post‑deploy monitoring) turn uncertain model output into production‑grade code.

Multi‑round repair loops with Coder/Reviewer collaboration and monotonic constraint accumulation drive convergence and dramatically reduce manual effort.

By integrating these components, RenderFlow compresses per‑scenario delivery from days to minutes while maintaining stability across nearly a thousand online search rendering scenarios.

Architecture diagram
Architecture diagram
Executable engine diagram
Executable engine diagram
Repair loop diagram
Repair loop diagram
Engine execution flow
Engine execution flow
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

code generationLLMagentic deliveryexecutable enginemulti‑round repairsearch rendering
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.