Artificial Intelligence 8 min read

How OpenClaw’s New Plugin Reveals Every LLM Decision Step

The OpenClaw CMS plugin 0.1.2 upgrades observability for AI agents by fully restoring multi‑round execution traces, stabilizing concurrent chains, adding STEP spans, and quantifying agent metrics, turning raw trace graphs into actionable insights for debugging, testing, cost control, and cross‑team collaboration.

Alibaba Cloud Observability

Apr 6, 2026

How OpenClaw’s New Plugin Reveals Every LLM Decision Step

Background

OpenClaw‑cms‑plugin is an observability extension for Alibaba Cloud CMS that records every OpenClaw task invocation. It follows the GenAI semantic conventions, enabling fine‑grained tracing of LLM‑tool interactions.

Limitations of earlier versions (e.g., 0.1.1)

Only the first and last LLM request/response are captured; intermediate rounds are omitted.

The trace hierarchy does not reflect the true ReAct execution flow, which can mislead debugging.

In concurrent runs, trace links may break or become interleaved, causing unstable task associations.

Root cause: ReAct multi‑round agent model

An OpenClaw agent implements the ReAct iterative pattern: each round consists of a judgment, tool selection, result absorption, and planning for the next step. Collapsing the whole process into a single LLM span discards the semantics of each intermediate round.

Key enhancements in version 0.1.2

Multi‑round LLM segmentation The plugin now emits a distinct span for every LLM → TOOL → LLM segment. It supports structured assistant output blocks (reasoning, text, toolCall) and, after a batch of tool calls, reconstructs the next LLM input context.

Improved concurrency stability To avoid write conflicts in parallel executions, the plugin:

Serialises trace writes according to a per‑trace queue.

Activates agent‑channel anchors so that each chain retains correct ownership.

Calls endTrace() for non‑destructive cleanup, preventing premature truncation.

Uses the root llm_input self‑healing mechanism to recover from abnormal interruptions.

New STEP span A span with gen_ai.span.kind=STEP records the round number and adds the following attributes:

gen_ai.operation.name=react

gen_ai.react.round

gen_ai.react.finish_reason

This creates a standard hierarchy: ENTRY → AGENT → STEP → (LLM / TOOL …) .

Agent metric overhaul Three core metrics are now calculated precisely: agent.message_count – exact count of messages derived from event.messages.length. agent.tool_call_count – sequential count of assistant tool‑call blocks. usage (token usage) – aggregated from llm_output and written once at agent_end.

These metrics provide reliable observability of message, tool, and token consumption.

Practical benefits

Faster debugging – Each tool invocation is linked to the exact reasoning step, reducing investigation time from minutes to seconds.

Stable concurrent regression testing – Deterministic chain links enable acceptance criteria based on run‑level consistency, STEP rounds, and parent‑child relationships.

Granular cost governance – Precise message, tool, and token counts allow teams to identify high‑consumption patterns and optimise prompts and tool orchestration.

Cross‑role collaboration – Development, testing, and operations share a semantically rich trace, lowering communication overhead.

Rapid incident mitigation – Detailed STEP and finish‑reason data compress root‑cause analysis from minutes to seconds when tool parameters, model retries, or concurrency mismatches occur.