Artificial Intelligence 14 min read

Seeing Inside Hermes: Full Observability of Agent Execution with OpenTelemetry

The article explains how Alibaba Cloud’s Hermes observability plugin, built on OpenTelemetry, makes the entire execution process of AI agents visible by tracing reasoning steps, tool calls, token usage, latency, and security risks, enabling precise cost, performance, and error analysis.

Alibaba Cloud Observability

Apr 27, 2026

Seeing Inside Hermes: Full Observability of Agent Execution with OpenTelemetry

Problem

When an AI agent solves a task, the difficulty is not only whether the answer is correct but also understanding what the agent actually did. Existing systems typically expose only the final reply, total token usage, or a single request summary, leaving the internal multi‑round reasoning, tool invocations, context growth, and error points invisible. Four concrete pain points are identified:

Process invisibility : No detailed view of each reasoning round, tool call, or context expansion.

Cost attribution : High token bills cannot be linked to specific rounds, large tool results, or long final outputs.

Performance decomposition : Users only know the request is slow, but cannot tell whether the first token, the whole generation, or a tool execution caused the slowdown.

Result reproducibility : When the final answer looks plausible but is wrong, there is no trace of which tool was called, what parameters were passed, or how the reasoning deviated.

Solution – Hermes Observability Plugin

The plugin instruments the Python runtime of the Hermes agent framework, creates OpenTelemetry span s around key execution boundaries, and reports trace and metric data via the OTLP protocol to any compatible backend (e.g., Alibaba Cloud ARMS). It follows the OpenTelemetry GenAI semantic conventions and extends them with LoongSuite conventions for agent‑specific fields.

Key Advantages

Uses standard GenAI attributes (e.g., gen_ai.request.model, gen_ai.usage.input_tokens) and adds agent‑specific fields such as gen_ai.tool.name and gen_ai.tool.call.arguments.

Provides both per‑request trace spans and aggregated metrics (call count, error count, latency, token usage, time‑to‑first‑token).

Streaming‑aware TTFT captures the delay of the first token, distinguishing first‑word latency from overall generation latency.

Backend‑agnostic: data are sent via standard OTLP, so the same plugin works with ARMS or any OTLP‑compatible observability platform.

Collects full operation logs and applies a dynamic audit model to flag privilege‑escalation, data‑leak, and prompt‑injection risks.

Observability Data Model

Each model‑call span records:

gen_ai.request.model

gen_ai.usage.input_tokens

gen_ai.usage.output_tokens

gen_ai.usage.total_tokens

gen_ai.response.time_to_first_token

Each tool‑call span records:

gen_ai.tool.name

gen_ai.tool.call.arguments

gen_ai.tool.call.result

An agent‑level aggregation span captures the total tokens, final output message, and total latency for the whole task.

Installation & Activation

1. In the Cloud Monitor Service (CMS) 2.0 console, navigate to AI Application Observability → Hermes and copy the generated installation command.

curl -fsSL https://arms-apm-cn-hangzhou-pre.oss-cn-hangzhou.aliyuncs.com/hermes-agent-cms-plugin/hermes-cms.sh | bash -s -- install \
  --x-arms-license-key "auto" \
  --x-arms-project "YourProject" \
  --x-cms-workspace "YourWorkspace" \
  --serviceName "hermes" \
  --endpoint "https://your-ARMS-OTLP-endpoint/apm/trace/opentelemetry"

2. The script registers hermes-cms on the host and provides hermes-cms enable, hermes-cms disable, and hermes-cms uninstall commands.

3. Enable the plugin: hermes-cms enable 4. Start Hermes as usual (e.g., hermes or hermes gateway start).

When the plugin starts, the console prints:

loongsuite-site-bootstrap: started successfully (OpenTelemetry auto-instrumentation initialized).

Verification

Send a few test requests that trigger multi‑round reasoning and tool calls. After about a minute, the CMS console shows:

Model call count and token consumption trends.

Average number of reasoning rounds per request.

Breakdown of latency for Agent, LLM, and Tool phases.

Full trace view that reveals which round invoked which tool, the arguments passed, and the result returned.

Security Auditing

The plugin records every agent operation, builds an audit view, and flags risky behaviors such as unauthorized access, data export, or malicious prompt injection.

Future Work

Data side : Extend from trace and metric attributes to full log audit and diagnostic capabilities.

Link side : Add finer‑grained spans for Hermes‑specific stages such as memory lifecycle, delegation orchestration, and runtime recovery.

Governance side : Strengthen data collection controls, fine‑grained governance, and unified masking and security policies.

With the current plugin, Hermes can render a real execution as a structured ReAct trace, attributing token usage and latency to each model call and tool invocation, and providing a security audit view for high‑risk operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Observability Metrics OpenTelemetry AI Agent Tracing Hermes Security Audit

Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.