Artificial Intelligence 18 min read

Harness Engineering: Making Multi‑Agent Systems Safe and Trustworthy from Demo to Production

In a 90‑minute live technical session, three experts dissect ten core challenges of Agent engineering—sandbox vs permission boundaries, checkpoints, rollback, tool‑call safety, human‑in‑the‑loop, multi‑agent coordination, observability, and memory—showing that moving agents from "usable" to "trustworthy" requires fine‑grained execution controls rather than broader permissions.

DataFunSummit

Jun 5, 2026

Harness Engineering: Making Multi‑Agent Systems Safe and Trustworthy from Demo to Production

01. From Demo to Production: Why the Hard Part Is Not Building the Agent

The discussion opens by contrasting a simple demo, which can be built with existing LLMs and workflow tools, with a production deployment where an Agent must execute real actions such as modifying configurations, invoking APIs, clicking UI elements, or triggering payments. This shift turns the problem from "capability construction" into "execution constraint".

02. First Guardrail: Sandbox vs Permission Boundary

When asked which guardrail to implement first, both guests agree that the choice depends on the scenario, but in production both are indispensable. Qǔ Xiángmó, representing mobile GUI agents at OPPO, notes that mobile environments cannot be fully sandboxed because device APIs, account systems, and UI interactions are tightly bound to hardware. Instead, OPPO emphasizes layered permission checks: the client first detects sensitive screens, then the Agent performs intent validation, and finally a risk check before action generation.

Yáo Bīnbīn, from Tencent Cloud, argues that in cloud environments sandboxing and permission boundaries are a non‑choice. A sandbox isolates the execution environment, while permission boundaries constrain business outcomes; both are required to prevent accidental deletions, VM rebuilds, or network changes.

03. "Strict First, Lenient Later": Default Deny in Production Systems

Both guests advocate a conservative design philosophy: start with the tightest permissions and relax only where real needs arise. Yáo likens this to firewall policies—default deny, then allow rule by rule. This approach acknowledges that irreversible systems cannot rely on the assumption that models rarely err.

04. Checkpoint: When to Interrupt and When Not To

Checkpoints address the question "at which step must a human be notified?" Over‑interrupting is as risky as under‑interrupting. Qǔ proposes three checkpoint categories: (1) irreversible operations (e.g., payment, deletion, authorization); (2) incomplete intents (e.g., ordering a drink without specifying size); (3) execution path conflicts (e.g., booking a sold‑out flight). Yáo adds that checkpoints should be risk‑aware: high‑risk actions require manual confirmation, low‑risk actions can auto‑retry, and medium‑risk actions are judged based on context, confidence, and business constraints.

05. Rollback: Why It Is Harder for GUI Than for API

Both guests agree that rollback is one of the toughest engineering problems for Agents. Yáo explains that rollback depends on the underlying system’s maturity: declarative platforms like Kubernetes can revert by re‑applying the desired state, whereas stateful services require snapshots, backups, compensating transactions, or manual intervention.

Qǔ points out that GUI actions are not transactional; once a button is clicked, the front‑end and back‑end states change jointly. Therefore, GUI rollback often relies on step‑level reversal and compensatory actions rather than a simple "undo".

06. "Legal" Calls Are Not Safe: Tool‑Call Composition Risks

The panel discusses that a sequence of individually legal tool calls can produce dangerous outcomes. Yáo stresses that permission control must move from per‑API checks to task‑level audit: read‑only queries may be liberal, but configuration changes, bulk deletions, or permission grants must trigger stricter validation, secondary confirmation, or human approval.

Qǔ adds that GUI Agents must maintain intent consistency across a chain of actions, not just the immediate step.

07. Human‑in‑the‑Loop: Let Users Grab the Steering Wheel

Both guests describe HITL as a regular capability, not an exception. They use the metaphor of a driver pressing the brake: users should be able to pause the Agent, intervene, and then resume. Qǔ notes that mobile agents must handle dynamic changes such as UI redesigns, missing buttons, stock outs, or captcha challenges by yielding control when confidence drops.

Yáo emphasizes that in enterprise settings, human intervention also satisfies compliance requirements for privileged actions.

08. Multi‑Agent Coordination: Who Gets the Final Say?

The discussion stresses that multi‑Agent setups are acceptable only if decision authority is centralized. Qǔ summarizes the preferred architecture as "one brain, many hands," meaning a central Agent makes planning and intent decisions while subordinate Agents execute specific roles.

Yáo supports a Super‑Agent or central coordinator model, arguing that distributed decision‑making dramatically increases engineering complexity. He recommends assigning independent workspaces (e.g., using git worktree) to each Agent to keep boundaries clear.

09. Observability and Error Attribution

Yáo proposes a three‑layer reliability stack: offline benchmarking, online telemetry, and error attribution. Offline tests must be refreshed as UI or APIs evolve. Online, every prompt, tool parameter, result, error code, and latency should be recorded via standards like OpenTelemetry to enable audit, trace, and replay.

Qǔ adds that evaluation of GUI Agents should consider not only task success but also interruption frequency, dangerous path traversal, and unnecessary help requests. Agent quality is thus a balance of success rate, interaction burden, safety, and stability.

10. Memory and Experience Accumulation

Memory is highlighted as essential for long‑term usability. Yáo warns that context length cannot grow indefinitely; once a threshold is reached, compression must retain error codes, key nodes, failure reasons, and user preferences.

Qǔ stresses "experience retention": every failure, corrective action, or manual override should be absorbed as lasting knowledge so the Agent evolves from a novice to a seasoned assistant.

11. Conclusion: Harness Engineering Remains Vital

The speakers agree that even as models become more capable, Harness Engineering will not become obsolete because real‑world deployment involves permissions, business rules, compliance, exception handling, user preferences, and organizational processes that models alone cannot guarantee. The engineering layer that defines fine‑grained boundaries is what makes an Agent trustworthy enough for production.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

observability safety Sandbox Checkpoint Rollback human-in-the-loop agent engineering multi-agent coordination

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.