Building Human‑in‑the‑Loop Agent Workflows with MCP on OpenLM

This article explains how to design and implement Human‑in‑the‑Loop (HITL) interactions for large‑model agents on Alibaba's OpenLM platform, covering the challenges of server‑side execution, MCP transport extensions, tool‑calling patterns, timeout handling, and UI rendering strategies across multiple client devices.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Building Human‑in‑the‑Loop Agent Workflows with MCP on OpenLM

Background

Human‑in‑the‑Loop (HITL) refers to a loop where humans supervise, intervene, or approve actions taken by an AI agent. In typical scenarios the human can provide input, interrupt execution, or resume the agent after clarification. When agents run on remote servers rather than a user’s PC, implementing HITL becomes complex due to concurrency, distributed execution, and the need for reliable communication channels.

Challenges in Server‑Side Agent Execution

Existing client‑side tools (e.g., Cline, Cursor, Claude Code, iFlow CLI) expose simple "tool/call" interfaces that work well on a single‑process, single‑client model. However, in a micro‑service deployment the agent must handle multiple concurrent sessions, persist and restore graph state, and coordinate responses across nodes. Standard MCP (Model Context Protocol) with stdio transport is easy to implement but does not scale; even HTTP+SSE transport leaves many engineering problems unsolved. Streamable transport mitigates some issues but still requires substantial redesign.

Solution Overview

The proposed solution leverages MCP to provide a unified HITL mechanism that works both for client‑side and server‑side agents without major architectural changes. The core ideas are:

Introduce an send_inquiry MCP tool that accepts a prompt string and returns an inquiryId to the caller.

Use the MCP Notification channel to push inquiry progress and IDs back to the client.

Wrap existing tool calls with a confirmation step: before executing a tool, the agent calls send_inquiry and waits for the human’s answer (yes/no or detailed input).

Provide a proxy MCP server that can forward tool calls while preserving the HITL flow.

Implementation Details

FastAPI HITL Service

from fastapi import FastAPI
from chatagent import graph_with_menu
from langgraph.types import Command
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.get("/")
async def read_root():
    return {"message": "Welcome to your FastAPI backend!"}

@app.get("/chat_initiate")
async def read_item(thread_id: str, response: str = None):
    thread_config = {"configurable": {"thread_id": thread_id}}
    state = await graph_with_menu.invoke({"messages": [], "order": [], "finished": False}, config=thread_config)
    return {"AIMessage": state["messages"][-1].content, "state": state, "thread_id": thread_id}

@app.get("/chat-continue")
async def continue_chat(thread_id: str, response: str):
    thread_config = {"configurable": {"thread_id": thread_id}}
    state = await graph_with_menu.invoke(Command(resume=response), config=thread_config)
    return {"AIMessage": state["messages"][-1].content, "state": state, "thread_id": thread_id}

MCP Notification Example

{
  "method": "notifications/progress",
  "params": {
    "meta": {
      "question": "明天北京天气如何?",
      "inquiryId": "a4cecc76-2fb3-41bc-97ae-e809059ad68a",
      "type": "INQUIRY"
    },
    "progressToken": 1,
    "progress": 0
  },
  "jsonrpc": "2.0"
}

OpenAI‑compatible ChatCompletions Chunk with Embedded Notification

{
  "id": "chatcmpl-202d02d5-68cd-40d1-bd5f-0dc82751ba89",
  "created": 1756272472,
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "delta": {
      "chatos_additional_data": {
        "mcp_progress_notification_data": "{\"method\":\"notifications/progress\",\"params\":{\"meta\":{\"question\":\"明天北京天气如何?\",\"inquiryId\":\"a4cecc76-2fb3-41bc-97ae-e809059ad68a\",\"type\":\"INQUIRY\"},\"progressToken\":1,\"progress\":0.0}}"
      }
    }
  }],
  "agent_info": {
    "name": "Tool Call Agent",
    "run_id": "202d02d5-68cd-40d1-bd5f-0dc82751ba89"
  }
}

The agent workflow is:

When the agent needs clarification, it calls send_inquiry with a prompt describing the missing information.

The MCP server returns an inquiryId and pushes a Notification frame to all connected clients.

Clients render a UI (web, desktop, or mobile) that displays the question and collects the user’s answer.

Upon receiving the answer, the client posts it back to the MCP server, which resumes the original tool call.

If the user rejects the request, the server returns a TextContent indicating refusal, and the agent can either abort or follow an alternative decision path.

YOLO Mode and Decision Strategies

YOLO (You Only Look Once) mode, introduced by Cursor in late 2024, lets the agent bypass human confirmation and execute actions automatically. When YOLO is enabled, the agent must be prompted to ignore HITL calls, otherwise the default behavior remains to request human input. Decision strategies include:

Server‑side decision: The agent’s prompt forces it to avoid HITL tools when YOLO is on.

Client‑side decision: The client can auto‑reply with “timeout” or “random” responses, effectively delegating the decision to the server.

For more robust handling, a secondary “decision agent” can be invoked when the primary agent receives a refusal or timeout, allowing the system to generate a fallback answer without human involvement.

Client‑Side Rendering and Multi‑Device Coordination

Clients consume the SSE stream from the agent’s main endpoint (often OpenAI‑compatible ChatCompletions). When an inquiry frame arrives, the UI renders a modal or inline form, waits for the user, and sends the response via a dedicated API. Timeouts trigger an automatic "no answer" response. In multi‑device scenarios, the server can broadcast the same inquiry via Notification to all logged‑in devices, allowing any device to answer and automatically closing the prompt on the others.

Timeout Configuration

Typical MCP tool call timeout is 30 seconds. For HITL interactions the client‑side timeout should be longer than the MCP server timeout, which in turn should exceed the overall service timeout, ensuring the agent can wait for a human answer before giving up.

Prompt Engineering for HITL

A sample prompt fragment enforces strict clarification before proceeding:

<instruction>
Please follow these steps:
1. **Intent Analysis** – Determine if the user intent is clear.
2. **Clarification** – If unclear, MUST call `send_inquiry` and wait for a human response.
3. **Information Retrieval** – Once intent is clear, call external search tools if needed.
4. **Verification** – Verify retrieved data; if uncertain, repeat search.
5. **Synthesis** – Combine internal knowledge and external data into a final answer.
</instruction>

Additional directives such as clarify_user_intent and information_gathering are defined in XML‑like syntax to give the model high‑priority rules for when to invoke HITL tools versus autonomous search.

Tool Description and Dynamic Overrides

In MCP, each tool’s description is merged into the model’s context. To adapt a tool for specific workflows, the description can be overridden via QueryString parameters or by the client replacing parts of the schema after a tools/list call. This enables per‑session customization without redeploying the MCP server.

Conclusion

The presented architecture demonstrates how to embed Human‑in‑the‑Loop capabilities into large‑model agent platforms using MCP, without rewriting existing services. By separating confirmation logic into a dedicated MCP tool, supporting notification‑driven UI updates, and providing flexible timeout and decision strategies, developers can build robust, scalable AI assistants that gracefully involve humans when needed.

Architecture diagram
Architecture diagram
HITL flow diagram
HITL flow diagram
Proxy server diagram
Proxy server diagram
MCP Proxy integration
MCP Proxy integration
Multi‑device notification
Multi‑device notification
Prompt for clarification
Prompt for clarification
Tool description example
Tool description example
User answer UI
User answer UI
Multiple question UI
Multiple question UI
Rich text answer UI
Rich text answer UI
Tool description with examples
Tool description with examples
Simple answer example
Simple answer example
Complex question example
Complex question example
Cheat‑sheet UI
Cheat‑sheet UI
Prompt for refusal handling
Prompt for refusal handling
Refusal UI
Refusal UI
Alternative refusal UI
Alternative refusal UI
Tool proxy refusal handling
Tool proxy refusal handling
Prompt for tool proxy
Prompt for tool proxy
Timeout UI
Timeout UI
Timeout response example
Timeout response example
Default answer UI
Default answer UI
Decision agent diagram
Decision agent diagram
MCP description override
MCP description override
Client description injection
Client description injection
Agent as human analogy
Agent as human analogy
Team introduction
Team introduction
MCPprompt engineeringAgentLarge Language ModelTool CallingHuman-in-the-loop
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.