Artificial Intelligence 39 min read

Building Human‑in‑the‑Loop Agent Workflows with MCP on OpenLM

This article explains how to design and implement Human‑in‑the‑Loop (HITL) interactions for large‑model agents on Alibaba's OpenLM platform, covering the challenges of server‑side execution, MCP transport extensions, tool‑calling patterns, timeout handling, and UI rendering strategies across multiple client devices.

Alibaba Cloud Developer

Dec 9, 2025

Building Human‑in‑the‑Loop Agent Workflows with MCP on OpenLM

Background

Human‑in‑the‑Loop (HITL) refers to a loop where humans supervise, intervene, or approve actions taken by an AI agent. In typical scenarios the human can provide input, interrupt execution, or resume the agent after clarification. When agents run on remote servers rather than a user’s PC, implementing HITL becomes complex due to concurrency, distributed execution, and the need for reliable communication channels.

Challenges in Server‑Side Agent Execution

Existing client‑side tools (e.g., Cline, Cursor, Claude Code, iFlow CLI) expose simple "tool/call" interfaces that work well on a single‑process, single‑client model. However, in a micro‑service deployment the agent must handle multiple concurrent sessions, persist and restore graph state, and coordinate responses across nodes. Standard MCP (Model Context Protocol) with stdio transport is easy to implement but does not scale; even HTTP+SSE transport leaves many engineering problems unsolved. Streamable transport mitigates some issues but still requires substantial redesign.

Solution Overview

The proposed solution leverages MCP to provide a unified HITL mechanism that works both for client‑side and server‑side agents without major architectural changes. The core ideas are:

Introduce an send_inquiry MCP tool that accepts a prompt string and returns an inquiryId to the caller.

Use the MCP Notification channel to push inquiry progress and IDs back to the client.

Wrap existing tool calls with a confirmation step: before executing a tool, the agent calls send_inquiry and waits for the human’s answer (yes/no or detailed input).

Provide a proxy MCP server that can forward tool calls while preserving the HITL flow.

Implementation Details

FastAPI HITL Service

from fastapi import FastAPI
from chatagent import graph_with_menu
from langgraph.types import Command
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.get("/")
async def read_root():
    return {"message": "Welcome to your FastAPI backend!"}

@app.get("/chat_initiate")
async def read_item(thread_id: str, response: str = None):
    thread_config = {"configurable": {"thread_id": thread_id}}
    state = await graph_with_menu.invoke({"messages": [], "order": [], "finished": False}, config=thread_config)
    return {"AIMessage": state["messages"][-1].content, "state": state, "thread_id": thread_id}

@app.get("/chat-continue")
async def continue_chat(thread_id: str, response: str):
    thread_config = {"configurable": {"thread_id": thread_id}}
    state = await graph_with_menu.invoke(Command(resume=response), config=thread_config)
    return {"AIMessage": state["messages"][-1].content, "state": state, "thread_id": thread_id}

MCP Notification Example

{
  "method": "notifications/progress",
  "params": {
    "meta": {
      "question": "明天北京天气如何？",
      "inquiryId": "a4cecc76-2fb3-41bc-97ae-e809059ad68a",
      "type": "INQUIRY"
    },
    "progressToken": 1,
    "progress": 0
  },
  "jsonrpc": "2.0"
}

OpenAI‑compatible ChatCompletions Chunk with Embedded Notification

{
  "id": "chatcmpl-202d02d5-68cd-40d1-bd5f-0dc82751ba89",
  "created": 1756272472,
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "delta": {
      "chatos_additional_data": {
        "mcp_progress_notification_data": "{\"method\":\"notifications/progress\",\"params\":{\"meta\":{\"question\":\"明天北京天气如何？\",\"inquiryId\":\"a4cecc76-2fb3-41bc-97ae-e809059ad68a\",\"type\":\"INQUIRY\"},\"progressToken\":1,\"progress\":0.0}}"
      }
    }
  }],
  "agent_info": {
    "name": "Tool Call Agent",
    "run_id": "202d02d5-68cd-40d1-bd5f-0dc82751ba89"
  }
}

The agent workflow is:

When the agent needs clarification, it calls send_inquiry with a prompt describing the missing information.

The MCP server returns an inquiryId and pushes a Notification frame to all connected clients.

Clients render a UI (web, desktop, or mobile) that displays the question and collects the user’s answer.

Upon receiving the answer, the client posts it back to the MCP server, which resumes the original tool call.

If the user rejects the request, the server returns a TextContent indicating refusal, and the agent can either abort or follow an alternative decision path.

YOLO Mode and Decision Strategies

YOLO (You Only Look Once) mode, introduced by Cursor in late 2024, lets the agent bypass human confirmation and execute actions automatically. When YOLO is enabled, the agent must be prompted to ignore HITL calls, otherwise the default behavior remains to request human input. Decision strategies include:

Server‑side decision: The agent’s prompt forces it to avoid HITL tools when YOLO is on.

Client‑side decision: The client can auto‑reply with “timeout” or “random” responses, effectively delegating the decision to the server.

For more robust handling, a secondary “decision agent” can be invoked when the primary agent receives a refusal or timeout, allowing the system to generate a fallback answer without human involvement.

Client‑Side Rendering and Multi‑Device Coordination

Clients consume the SSE stream from the agent’s main endpoint (often OpenAI‑compatible ChatCompletions). When an inquiry frame arrives, the UI renders a modal or inline form, waits for the user, and sends the response via a dedicated API. Timeouts trigger an automatic "no answer" response. In multi‑device scenarios, the server can broadcast the same inquiry via Notification to all logged‑in devices, allowing any device to answer and automatically closing the prompt on the others.

Timeout Configuration

Typical MCP tool call timeout is 30 seconds. For HITL interactions the client‑side timeout should be longer than the MCP server timeout, which in turn should exceed the overall service timeout, ensuring the agent can wait for a human answer before giving up.

Prompt Engineering for HITL

A sample prompt fragment enforces strict clarification before proceeding:

<instruction>
Please follow these steps:
1. **Intent Analysis** – Determine if the user intent is clear.
2. **Clarification** – If unclear, MUST call `send_inquiry` and wait for a human response.
3. **Information Retrieval** – Once intent is clear, call external search tools if needed.
4. **Verification** – Verify retrieved data; if uncertain, repeat search.
5. **Synthesis** – Combine internal knowledge and external data into a final answer.
</instruction>

Additional directives such as clarify_user_intent and information_gathering are defined in XML‑like syntax to give the model high‑priority rules for when to invoke HITL tools versus autonomous search.

Tool Description and Dynamic Overrides

In MCP, each tool’s description is merged into the model’s context. To adapt a tool for specific workflows, the description can be overridden via QueryString parameters or by the client replacing parts of the schema after a tools/list call. This enables per‑session customization without redeploying the MCP server.

Conclusion

The presented architecture demonstrates how to embed Human‑in‑the‑Loop capabilities into large‑model agent platforms using MCP, without rewriting existing services. By separating confirmation logic into a dedicated MCP tool, supporting notification‑driven UI updates, and providing flexible timeout and decision strategies, developers can build robust, scalable AI assistants that gracefully involve humans when needed.