How Honcho’s Dialectic User Model Lets Agents Learn Your Preferences Over Time
The article explains how Honcho transforms scattered conversation facts into a structured user model through a dialectic reasoning loop, detailing memory vs. user model differences, tool architecture, recall modes, prefetch caching, cost‑control mechanisms, peer cards, and common pitfalls for building ever‑more personalized AI agents.
01 User Modeling vs. User Memory: A Subtle but Crucial Difference
User memory is a collection of independent key‑value facts manually written, such as "user is James" or "dislikes comments". In contrast, a user model is a dynamic, structured representation inferred by Honcho’s dialectic reasoning, e.g., "a Python developer who prefers concise code explanations and has intermediate AI agent knowledge".
The table in the original article compared dimensions such as source (manual vs. inference), update method (manual append vs. asynchronous per turn), format (key‑value list vs. natural‑language representation + Peer Card), granularity (single fact vs. cross‑session portrait), conflict handling, and injection method.
Hermes’s built‑in memory system (MEMORY.md + USER.md) provides the "user memory" layer, while Honcho adds the complementary "user model" layer.
02 Honcho Architecture: Five Tools, Three Recall Modes
Honcho exposes five tools ordered by cost: profile (read Peer Card, no LLM), search (semantic retrieval), context (conversation snapshot), reasoning (LLM synthesis, most expensive), and conclude (write results). The three recall modes determine tool usage: context: automatic injection only, no tool exposure. tools: tools are exposed for the agent to call explicitly. hybrid (default): combines automatic injection with tool availability.
Most users stick with the default hybrid mode, using context for passive scenarios and tools when the agent needs to actively query user context.
03 Dialectic Reasoning Loop: What the Agent Does After Each Turn
After every conversation turn, Honcho runs an asynchronous dialectic loop. Instead of merely recording utterances, it asks an LLM: "Based on the dialogue history, what kind of user is this?" The loop can run up to three Passes with early‑stop logic: Pass 0 checks a signal; if sufficient, Pass 1 and Pass 2 are skipped.
Signal sufficiency is met when the result exceeds 300 characters or contains structured markers (## title, bullet lists, numbered items). The reasoning level for each Pass is dynamic: Pass 0 uses minimal, Pass 1 uses base, and Pass 2 uses low. Longer user queries (>120 chars or >400 chars) automatically raise the reasoning level by one or two steps.
04 Peer Card vs. Conclude: Two Ways to Write the User Portrait
Honcho maintains two layers of user representation:
Representation : a dynamic, internal document updated after each dialectic pass; format is free‑form.
Peer Card : a concrete list[str] extracted from the representation, each entry being a clear fact.
To add a single conclusion, agents invoke honcho_conclude, which appends incrementally. To replace the entire card, they call set_peer_card, which rewrites the full list. The dialectic engine automatically keeps the representation up‑to‑date.
# Python example
card = honcho_manager.get_peer_card(session_key, peer="user")
# Example content: ["User is a Python developer", "Prefers short comments", "Learning LangChain"]
honcho_manager.create_conclusion(
session_key,
"User strongly prefers Python; will reject TypeScript examples",
peer="user"
)05 Prefetch Mechanism: Warm‑up Before the Next Turn
Instead of waiting synchronously each turn, Honcho prefetches dialectic results in the background. The next turn consumes the cached result with zero latency. The first turn has a special synchronous wait of up to 8 seconds; if it times out, the result is used in the following turn while the first turn falls back to the fast base context. Two independent caches exist: the base context from peer.context() refreshed at contextCadence , and the dialectic cache refreshed at dialecticCadence . They are merged and truncated to the token budget.
06 Rhythm Control and Empty‑Streak Back‑off: Cost‑Saving Details
Each dialectic pass incurs LLM cost. Honcho controls frequency with dialecticCadence (e.g., run every 2 turns). Light users can set it to 3‑5 turns. An empty‑streak back‑off automatically widens the interval when consecutive dialectic runs return empty, up to eight times the base interval.
dialecticCadence=2, empty_streak=0 → every 2 turns (normal)
dialecticCadence=2, empty_streak=3 → every 5 turns (2+3)
dialecticCadence=2, empty_streak=8 → every 10 turns (min(2+8, 2×8))Stale‑result detection discards dialectic outputs older than dialecticCadence × 2 turns if they were not consumed, preventing outdated context from contaminating the current conversation.
07 Three Peer Types: User, AI, and Custom Entities
Honcho can model not only users but also the AI itself and arbitrary custom entities. Built‑in aliases are peer="user" for humans and peer="ai" for the agent. Custom strings can serve as other peer IDs. Four observation switches (default true) control who observes whom: user_observe_me: records the user's own utterances. user_observe_others: lets the user see AI utterances. ai_observe_me: records the AI's own utterances. ai_observe_others: lets the AI see user utterances.
Disabling the AI observation switches stops AI‑side modeling while keeping user‑side modeling.
Common Pitfalls
Pitfall 1: Honcho installed but no effect – usually caused by a missing api_key. The offline check is_available() does not raise an error; verify the key with
cat ~/.hermes/honcho.json | python3 -c "import sys,json; print(json.load(sys.stdin).get('api_key','未配置')[:10])".
Pitfall 2: Peer Card always empty – not a bug. It means observation is disabled, the dialogue is too short, or a self‑hosted Honcho version < 3.x lacks the API. Use honcho_reasoning to inspect backend data.
Pitfall 3: First‑turn latency > 8 s – lower the timeout value in honcho.json (recommended 3 s). After timeout, the dialectic result appears in the second turn while the first turn uses the fast base context.
Pitfall 4: Cron jobs polluting the user model – Hermes’s cron guard skips writing when agent_context="cron" or platform="cron", setting _cron_skipped=True. Ensure custom cron tasks pass the correct context field.
Pitfall 5: Dialectic result mismatches current dialogue – indicates a topic shift. The stale‑result detection automatically discards such results when (current_turn - fired_at) > dialecticCadence × 2, triggering a fresh inference.
Conclusion
Honcho is not a simple memory store; it is a user‑model inference engine that converts fragmented conversation facts into a structured portrait, enabling agents to understand users better over time. Its core strengths are the multi‑Pass dialectic reasoning with early‑stop, the dual‑layer prefetch architecture that balances cost and depth, and fine‑grained rhythm control with empty‑streak back‑off. Additionally, Honcho can model AI peers, offering a bidirectional understanding between human and agent. The next article will explore model‑agnostic messaging, why Hermes adopts the OpenAI standard, and the engineering trade‑offs behind a unified format for over 200 models.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
