Seeing Inside Hermes: Full Observability of Agent Execution with OpenTelemetry
The article explains how Alibaba Cloud’s Hermes observability plugin, built on OpenTelemetry, makes the entire execution process of AI agents visible by tracing reasoning steps, tool calls, token usage, latency, and security risks, enabling precise cost, performance, and error analysis.
Problem
When an AI agent solves a task, the difficulty is not only whether the answer is correct but also understanding what the agent actually did. Existing systems typically expose only the final reply, total token usage, or a single request summary, leaving the internal multi‑round reasoning, tool invocations, context growth, and error points invisible. Four concrete pain points are identified:
Process invisibility : No detailed view of each reasoning round, tool call, or context expansion.
Cost attribution : High token bills cannot be linked to specific rounds, large tool results, or long final outputs.
Performance decomposition : Users only know the request is slow, but cannot tell whether the first token, the whole generation, or a tool execution caused the slowdown.
Result reproducibility : When the final answer looks plausible but is wrong, there is no trace of which tool was called, what parameters were passed, or how the reasoning deviated.
Solution – Hermes Observability Plugin
The plugin instruments the Python runtime of the Hermes agent framework, creates OpenTelemetry span s around key execution boundaries, and reports trace and metric data via the OTLP protocol to any compatible backend (e.g., Alibaba Cloud ARMS). It follows the OpenTelemetry GenAI semantic conventions and extends them with LoongSuite conventions for agent‑specific fields.
Key Advantages
Uses standard GenAI attributes (e.g., gen_ai.request.model, gen_ai.usage.input_tokens) and adds agent‑specific fields such as gen_ai.tool.name and gen_ai.tool.call.arguments.
Provides both per‑request trace spans and aggregated metrics (call count, error count, latency, token usage, time‑to‑first‑token).
Streaming‑aware TTFT captures the delay of the first token, distinguishing first‑word latency from overall generation latency.
Backend‑agnostic: data are sent via standard OTLP, so the same plugin works with ARMS or any OTLP‑compatible observability platform.
Collects full operation logs and applies a dynamic audit model to flag privilege‑escalation, data‑leak, and prompt‑injection risks.
Observability Data Model
Each model‑call span records:
gen_ai.request.model gen_ai.usage.input_tokens gen_ai.usage.output_tokens gen_ai.usage.total_tokens gen_ai.response.time_to_first_tokenEach tool‑call span records:
gen_ai.tool.name gen_ai.tool.call.arguments gen_ai.tool.call.resultAn agent‑level aggregation span captures the total tokens, final output message, and total latency for the whole task.
Installation & Activation
1. In the Cloud Monitor Service (CMS) 2.0 console, navigate to AI Application Observability → Hermes and copy the generated installation command.
curl -fsSL https://arms-apm-cn-hangzhou-pre.oss-cn-hangzhou.aliyuncs.com/hermes-agent-cms-plugin/hermes-cms.sh | bash -s -- install \
--x-arms-license-key "auto" \
--x-arms-project "YourProject" \
--x-cms-workspace "YourWorkspace" \
--serviceName "hermes" \
--endpoint "https://your-ARMS-OTLP-endpoint/apm/trace/opentelemetry"2. The script registers hermes-cms on the host and provides hermes-cms enable, hermes-cms disable, and hermes-cms uninstall commands.
3. Enable the plugin: hermes-cms enable 4. Start Hermes as usual (e.g., hermes or hermes gateway start).
When the plugin starts, the console prints:
loongsuite-site-bootstrap: started successfully (OpenTelemetry auto-instrumentation initialized).Verification
Send a few test requests that trigger multi‑round reasoning and tool calls. After about a minute, the CMS console shows:
Model call count and token consumption trends.
Average number of reasoning rounds per request.
Breakdown of latency for Agent, LLM, and Tool phases.
Full trace view that reveals which round invoked which tool, the arguments passed, and the result returned.
Security Auditing
The plugin records every agent operation, builds an audit view, and flags risky behaviors such as unauthorized access, data export, or malicious prompt injection.
Future Work
Data side : Extend from trace and metric attributes to full log audit and diagnostic capabilities.
Link side : Add finer‑grained spans for Hermes‑specific stages such as memory lifecycle, delegation orchestration, and runtime recovery.
Governance side : Strengthen data collection controls, fine‑grained governance, and unified masking and security policies.
With the current plugin, Hermes can render a real execution as a structured ReAct trace, attributing token usage and latency to each model call and tool invocation, and providing a security audit view for high‑risk operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
