Artificial Intelligence 45 min read

How TencentDB Agent Memory Cuts Tokens by 61% and Boosts Success Rate 52% with Mermaid Infinite Canvas and Context Offloading

The article presents a technical deep‑dive into TencentDB Agent Memory’s short‑term memory compression, which combines context offloading and a Mermaid‑based infinite canvas to reduce token usage by up to 61 % while improving task success rates by over 50 % across multiple long‑session benchmarks.

Tencent Cloud Developer

May 26, 2026

How TencentDB Agent Memory Cuts Tokens by 61% and Boosts Success Rate 52% with Mermaid Infinite Canvas and Context Offloading

Short‑Term Memory Compression

Proposes a two‑layer compression for LLM agents: context offloading stores full tool results in external files and keeps only a JSON‑L summary (summary, node_id, result_ref) in the active context; a Mermaid infinite canvas records each task step as a node with status, summary and timestamp, forming a directed graph that captures parallel branches, dependencies and progress. The canvas can be expanded or folded; folded nodes retain only metadata (taskGoal, status, updatedTime, mmdFilePath) in the context.

Design Principles for Symbolic Compression

Universality : symbols must be known to mainstream LLMs (e.g., Mermaid).

Simplicity : generation rules should be easy for both producer and consumer LLMs.

Flexibility : symbols should allow free expression without over‑constraining the model.

Implementation Details

When a tool call finishes, the raw result is written to refs/*.md. A JSON‑L record is appended to offload-<sessionId>.jsonl containing timestamp, node_id, tool_call, summary, result_ref, tool_call_id and offloaded:true. The same step is also rendered as a Mermaid node in mmds/<task>.mmd, e.g.

003-N4["timeseries-module-structure<br/>status: done<br/>summary: 列出 timeseries 目录，发现 core.py…<br/>Timestamp: 2026-04-16T22:19:53.895+08:00"]

The active MMD can be injected into the LLM context; folded nodes keep only the metadata fields above.

Experimental Setup

Four benchmark suites were run in ultra‑long sessions (≥50 sequential tasks) without clearing the context:

SWEbench (code‑fixing, 500 problems)

Toolathlon (complex multi‑step tasks)

WideSearch (200 web‑search tasks)

AA‑LCR (800 long‑document summarisation tasks)

Each baseline (no plugin) was compared against the compression plugin using offload models Opus 4.6, MiniMax 2.7 and GLM 5.1. Token usage, success rate and accuracy were recorded.

Key Results

WideSearch: token reduction up to 61.38 % , success‑rate increase from 33 % to 50 % (+51.52 %).

SWEbench: token reduction up to 33.09 % , completion‑rate lift 5.82 %–9.93 % (e.g., 0.584 → 0.642 with GLM 5.1).

Toolathlon: token reduction up to 26.18 % , pass‑rate rise from 20 % to 35 %.

AA‑LCR: total token cut by ~31 %, accuracy rise from 44 % to 47.5 %.

Ablation shows context offloading alone saves ~15 % tokens with no performance gain; adding the Mermaid canvas yields the full 31 %–33 % savings and the observed success improvements.

Symbol Design Choices

Two Mermaid diagram types were evaluated. Flowchart (free‑form graph) outperformed StateDiagram by ~15 % in token efficiency and was better suited to the open‑ended exploration of agent tasks, whereas StateDiagram is appropriate for strict lifecycle‑driven processes.

Memory Folding and Retrieval

The system stores information in four layers:

Raw tool result in refs/*.md.

JSON‑L tool‑call summary in offload-<sessionId>.jsonl.

Task‑step summary as a Mermaid node in mmds/*.mmd.

Metadata (taskGoal, status, updatedTime, mmdFilePath) kept in the active context.

When a node’s summary is insufficient, the agent can locate the corresponding JSON‑L entry via node_id, then retrieve the full document via result_ref. Folding replaces a full MMD with its metadata, preserving a lightweight entry point ( mmdFilePath) for later expansion.

Long‑Term Personalised Memory

A separate long‑term memory layer stores user preferences, goals and historical behaviour. On the PersonaMem benchmark (6000+ messages, 589 questions) the system raised accuracy from ≈48 % to 76 % (≈59 % relative gain).

Productisation and Open‑Source

The solution is shipped as a plugin for the OpenClaw framework and will be released on GitHub at https://github.com/Tencent/TencentDB-Agent-Memory. It is also integrated into Tencent Cloud products such as Qclaw, Lighthouse and ClawPro.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Agent long-context Memory compression Mermaid Context Offloading Token Savings

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.