Artificial Intelligence 18 min read

Designing a Stateful Multi‑Turn Dialogue Agent on Stateless LLMs

Building a production‑grade multi‑turn dialogue agent requires managing LLM’s statelessness by combining sliding‑window and summary history, implementing three‑layer memory (working, short‑term, long‑term), using explicit state tracking with incremental JSON updates, optimizing context windows, orchestrating tool calls, and adding meta‑control to handle failures and prompt‑injection risks.

Linyb Geek Road

Apr 15, 2026

Designing a Stateful Multi‑Turn Dialogue Agent on Stateless LLMs

1. Problem Analysis

Most people have used multi‑turn dialogue with ChatGPT, but designing a robust multi‑turn dialogue Agent is hard because it must coordinate several sub‑problems: context management, state tracking, memory storage, handling limited context windows, and tool invocation. Each sub‑problem is manageable alone, but their coupling makes overall complexity grow exponentially.

1.1 Dialogue History Management

LLMs are stateless; each call must receive the full conversation history in the prompt. The naive "full concatenation" works for a few rounds but quickly hits the token limit and wastes space on irrelevant early messages. Therefore a history‑management strategy must answer two questions: which messages to keep or discard, and how to represent the retained history.

Three common strategies: Sliding window : keep only the most recent N rounds. Simple but may drop earlier crucial information. Summary compression : when history exceeds a length, use an LLM to summarize early turns into a concise overview and combine it with the recent N rounds. Importance‑based selective retention : assign an importance score to each message (high for explicit user commands, low for chit‑chat) and retain high‑scoring messages.

In practice, a hybrid "summary + sliding window" approach is most common: summarize distant history, keep recent turns verbatim, and prepend a system prompt with persistent task background.

1.2 Memory System

Beyond per‑session context, a useful Agent needs cross‑session memory. The design typically consists of three layers:

Working Memory : the current conversation context (the short‑term window discussed above). High precision but limited capacity. Short‑term Memory : a lightweight key‑value store holding extracted structured slots from recent turns (e.g., "issue: timeout", "environment: production k8s"). These slots do not consume prompt tokens and can be injected on demand. Long‑term Memory : persistent storage of important facts across sessions, usually embeddings stored in a vector database. At the start of a new conversation, relevant memories are retrieved semantically and added to the system prompt.

The engineering challenges differ: working memory faces window‑management issues; short‑term memory hinges on accurate information extraction; long‑term memory must handle relevance decay and update mechanisms to avoid stale facts.

1.3 Dialogue State Tracking

Multi‑turn dialogue requires the Agent to know the current stage, missing information, and whether the user's intent has shifted. Traditional task‑oriented systems use predefined slots, but LLM‑based Agents face open‑ended intents.

Two approaches are described:

Implicit state tracking : rely on the LLM to read the full history each turn and infer the current state. Simple but suffers from attention decay on long histories. Explicit state tracking : after each turn, the LLM outputs a structured JSON state object (task progress, collected info, pending items). The next turn injects this object as part of the system prompt, providing a clear, controllable memory. To reduce token cost, only the delta (incremental changes) is emitted each round.

1.4 Context‑Window Engineering

The hard constraint is the finite LLM context window, even for models with 128K tokens. Strategies are organized in three layers:

Reduce input size : compress history, summarize, and trim tool‑call results to only the fields needed. Dynamically filter tool definitions based on the current topic. Layered storage and on‑demand loading : keep "always‑resident" items (system prompt, current state), "recent" items (sliding window), and "on‑demand" items (long‑term memory, summaries) in external storage, injecting them only when required. Multi‑Agent division : use a primary routing Agent and specialized sub‑Agents for distinct sub‑tasks, each receiving only the context relevant to its task, thereby reducing per‑Agent context pressure. Frameworks such as AutoGen and CrewAI adopt this architecture.

1.5 Tool Invocation and Dialogue Flow Orchestration

A practical Agent must call external tools (APIs, databases, code execution) without breaking conversational flow.

Key design decisions:

When to invoke : the LLM decides based on prompts that specify conditions (e.g., factual queries vs. opinion questions). State continuity across calls : maintain context so that references like "it" correctly point to objects from previous tool results. Graceful failure handling : transform errors (e.g., timeouts, empty results) into user‑friendly messages and optionally log for later retry.

1.6 Meta‑Control Mechanisms

Beyond normal operation, the Agent needs safeguards:

Clarification and confirmation : when confidence in intent is low, ask the user for clarification instead of guessing; thresholds depend on operation risk. Topic switching and intent drift : detect topic changes, pause or store current task state, and resume when the user returns. Security and output control : each input undergoes prompt‑injection detection; sensitive actions (deleting data, payments) require secondary confirmation and audit logging.

1.7 Design Summary

The core challenge is to build a full‑stack stateful dialogue management layer on top of a stateless LLM. History management handles "what was said recently", the three‑layer memory handles "what was said long ago", state tracking ensures the Agent knows "where we are now", context engineering packs the most important information into the limited window, tool orchestration integrates external actions, and meta‑control protects against errors and attacks. These six components are tightly coupled; for example, memory design influences context‑window strategy, and accurate state tracking improves tool‑call reliability.

2. Reference Answer

The author’s concrete answer mirrors the six points above, emphasizing a "summary + sliding window" history strategy, three‑layer memory, explicit JSON state with delta updates, three‑tier context optimization, seamless tool orchestration with error handling, and meta‑control for clarification, topic switching, and security.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM prompt injection multi-turn dialogue context window memory system tool orchestration state tracking

Written by

Linyb Geek Road

Tech notes

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.