How TencentDB Agent Memory Cuts Tokens by 61% and Boosts Success Rate 52% with Mermaid Infinite Canvas and Context Offloading
The article presents a technical deep‑dive into TencentDB Agent Memory’s short‑term memory compression, which combines context offloading and a Mermaid‑based infinite canvas to reduce token usage by up to 61 % while improving task success rates by over 50 % across multiple long‑session benchmarks.
Short‑Term Memory Compression
Proposes a two‑layer compression for LLM agents: context offloading stores full tool results in external files and keeps only a JSON‑L summary (summary, node_id, result_ref) in the active context; a Mermaid infinite canvas records each task step as a node with status, summary and timestamp, forming a directed graph that captures parallel branches, dependencies and progress. The canvas can be expanded or folded; folded nodes retain only metadata (taskGoal, status, updatedTime, mmdFilePath) in the context.
Design Principles for Symbolic Compression
Universality : symbols must be known to mainstream LLMs (e.g., Mermaid).
Simplicity : generation rules should be easy for both producer and consumer LLMs.
Flexibility : symbols should allow free expression without over‑constraining the model.
Implementation Details
When a tool call finishes, the raw result is written to refs/*.md. A JSON‑L record is appended to offload-<sessionId>.jsonl containing timestamp, node_id, tool_call, summary, result_ref, tool_call_id and offloaded:true. The same step is also rendered as a Mermaid node in mmds/<task>.mmd, e.g.
003-N4["timeseries-module-structure<br/>status: done<br/>summary: 列出 timeseries 目录,发现 core.py…<br/>Timestamp: 2026-04-16T22:19:53.895+08:00"]The active MMD can be injected into the LLM context; folded nodes keep only the metadata fields above.
Experimental Setup
Four benchmark suites were run in ultra‑long sessions (≥50 sequential tasks) without clearing the context:
SWEbench (code‑fixing, 500 problems)
Toolathlon (complex multi‑step tasks)
WideSearch (200 web‑search tasks)
AA‑LCR (800 long‑document summarisation tasks)
Each baseline (no plugin) was compared against the compression plugin using offload models Opus 4.6, MiniMax 2.7 and GLM 5.1. Token usage, success rate and accuracy were recorded.
Key Results
WideSearch: token reduction up to 61.38 % , success‑rate increase from 33 % to 50 % (+51.52 %).
SWEbench: token reduction up to 33.09 % , completion‑rate lift 5.82 %–9.93 % (e.g., 0.584 → 0.642 with GLM 5.1).
Toolathlon: token reduction up to 26.18 % , pass‑rate rise from 20 % to 35 %.
AA‑LCR: total token cut by ~31 %, accuracy rise from 44 % to 47.5 %.
Ablation shows context offloading alone saves ~15 % tokens with no performance gain; adding the Mermaid canvas yields the full 31 %–33 % savings and the observed success improvements.
Symbol Design Choices
Two Mermaid diagram types were evaluated. Flowchart (free‑form graph) outperformed StateDiagram by ~15 % in token efficiency and was better suited to the open‑ended exploration of agent tasks, whereas StateDiagram is appropriate for strict lifecycle‑driven processes.
Memory Folding and Retrieval
The system stores information in four layers:
Raw tool result in refs/*.md.
JSON‑L tool‑call summary in offload-<sessionId>.jsonl.
Task‑step summary as a Mermaid node in mmds/*.mmd.
Metadata (taskGoal, status, updatedTime, mmdFilePath) kept in the active context.
When a node’s summary is insufficient, the agent can locate the corresponding JSON‑L entry via node_id, then retrieve the full document via result_ref. Folding replaces a full MMD with its metadata, preserving a lightweight entry point ( mmdFilePath) for later expansion.
Long‑Term Personalised Memory
A separate long‑term memory layer stores user preferences, goals and historical behaviour. On the PersonaMem benchmark (6000+ messages, 589 questions) the system raised accuracy from ≈48 % to 76 % (≈59 % relative gain).
Productisation and Open‑Source
The solution is shipped as a plugin for the OpenClaw framework and will be released on GitHub at https://github.com/Tencent/TencentDB-Agent-Memory. It is also integrated into Tencent Cloud products such as Qclaw, Lighthouse and ClawPro.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
