Seeing Inside Hermes: Full Visibility into Agent Execution with OpenTelemetry
The article introduces Alibaba Cloud's Hermes observability plugin built on OpenTelemetry, which transforms the previously opaque AI agent runtime into a fully traceable system by recording every reasoning step, tool invocation, token usage, latency, and security event, enabling precise cost attribution, performance analysis, and audit of high‑risk behaviors.
Problem
Hermes is an autonomous AI agent runtime that performs multi‑round reasoning, tool calls, context expansion and new inference loops. A single request may involve many reasoning steps, tool invocations and token consumption, but the system only returns the final answer and a summary usage, making the execution process, cost drivers, performance bottlenecks and error sources invisible.
Observability gaps
Process invisibility : No call‑chain showing the number of reasoning rounds, which tools were called, and how they affected later steps.
Unattributed cost : Token usage is aggregated, so the round or tool that caused high expense cannot be identified.
Undifferentiated performance : Only overall latency is visible; it is unclear whether delay originates from first‑token generation, total generation, tool execution, or ReAct loops.
Irreproducible results : When the final answer is incorrect, there is no trace to determine whether a wrong tool call or incomplete result caused the deviation.
Solution – Hermes observability plugin
The plugin instruments the Hermes Python runtime with OpenTelemetry auto‑instrumentation. It creates span objects around the main execution boundaries (agent start, each LLM call, each tool call, and agent end) and exports trace and metric data via the OpenTelemetry Protocol (OTLP) to any compatible backend (e.g., Alibaba Cloud ARMS). The trace follows the OpenTelemetry GenAI semantic conventions and adds LoongSuite‑specific fields for agent‑level attributes.
Key capabilities
Standardized semantics : Uses GenAI attributes such as gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.total_tokens, gen_ai.response.time_to_first_token and custom fields gen_ai.tool.name, gen_ai.tool.call.arguments, gen_ai.tool.call.result.
Rich metrics : Provides per‑request and aggregated counters for call count, error count, latency, token consumption and TTFT, enabling trend analysis of cost and performance.
Backend‑agnostic : Data are sent over OTLP, so the same instrumentation works with ARMS or any other OTLP‑compatible observability platform.
Security audit : Full operation logs are captured; anomaly‑detection models can flag privilege escalation, data exfiltration or malicious prompt injection.
Observable data
Each Hermes execution is rendered as a structured ReAct trace containing:
LLM call attributes ( gen_ai.request.model, token counts, TTFT).
Tool call attributes ( gen_ai.tool.name, arguments, result).
Agent‑level aggregation (total tokens, final output, total latency).
Installation and activation
1. Obtain the installation command from the Alibaba Cloud CMS console.
2. Execute the generated command:
curl -fsSL https://arms-apm-cn-hangzhou-pre.oss-cn-hangzhou.aliyuncs.com/hermes-agent-cms-plugin/hermes-cms.sh | bash -s -- install \
--x-arms-license-key "auto" \
--x-arms-project "YourProject" \
--x-cms-workspace "YourWorkspace" \
--serviceName "hermes" \
--endpoint "https://YourARMS-OTLPAddress/apm/trace/opentelemetry"3. Enable the plugin with hermes-cms enable.
4. Start Hermes (e.g., hermes or hermes gateway start).
5. Successful instrumentation is indicated by the log line “loongsuite-site-bootstrap: started successfully (OpenTelemetry auto‑instrumentation initialized).”
Verification
After enabling, send test requests that trigger multiple reasoning rounds and tool calls. Within a few minutes the CMS console shows real‑time trace visualizations, token‑consumption trends, latency breakdowns and security‑audit dashboards.
Future work
Extend from trace and metric collection to full log audit and runtime diagnostics.
Refine Hermes‑specific stages such as memory lifecycle, delegation orchestration and runtime recovery.
Enhance data governance with finer‑grained collection controls, unified desensitization and security policies.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
