QueryEngine: One Instance Equals One Session – Full Breakdown of Claude Code’s Session Lifecycle
The article dissects Claude Code’s QueryEngine class, explaining how each QueryEngine instance represents a single conversation thread, detailing its configuration, state management across turns, the submitMessage workflow, SDK vs REPL modes, persistence mechanisms, and the four key engineering decisions and technical debts.
01 Architecture Positioning: One Instance = One Session
Each QueryEngine instance represents a complete dialogue thread, providing an explicit architectural boundary that isolates state between REPL, sub‑agents, and MCP sessions.
export class QueryEngine {
// Session state (persisted across turns)
private config: QueryEngineConfig;
private mutableMessages: Message[]; // accumulated history
private abortController: AbortController;
private permissionDenials: SDKPermissionDenial[];
private totalUsage: NonNullableUsage; // token consumption
private readFileState: FileStateCache;
private discoveredSkillNames = new Set<string>();
// Main entry
async *submitMessage(prompt: string | ContentBlockParam[], options?: SubmitMessageOptions): AsyncGenerator<QueryEvent> {}
}02 QueryEngineConfig: 34 Injected Fields
The engine receives its environment via dependency injection, avoiding internal state for application‑level data.
export type QueryEngineConfig = {
cwd: string; // working directory
tools: Tools; // available tools
commands: Command[]; // slash commands
mcpClients: MCPServerConnection[]; // MCP connections
canUseTool: CanUseToolFn; // permission check (injected)
getAppState: () => AppState; // read external state
setAppState: (f: StateUpdater) => void; // write external state
initialMessages?: Message[]; // restore history on --continue
readFileCache: FileStateCache;
customSystemPrompt?: string;
maxTurns?: number; // sub‑agent limit
maxBudgetUsd?: number; // budget in USD
};Note: getAppState and setAppState are external injections, not owned by the engine.
03 submitMessage: A Turn’s Complete Journey
The method performs budget checking, processes user input, rebuilds the system prompt each turn, loads memory files, appends to the mutable message list, runs the core query() AsyncGenerator loop, yields events to the REPL, updates usage, and finally persists the transcript.
async *submitMessage(prompt, options): AsyncGenerator<QueryEvent> {
// 1. Budget guard
if (this.config.maxTurns && this.getTurnCount() >= this.config.maxTurns) {
yield { type: 'budget_exceeded', reason: 'max_turns' };
return;
}
// 2. Process user input
const { messages: userMessages } = await processUserInput(prompt, ...);
// 3. Rebuild system prompt (no cache)
const systemPromptParts = await fetchSystemPromptParts({
cwd: getCwd(),
tools: this.config.tools,
commands: this.config.commands,
mcpClients: this.config.mcpClients,
});
// 4. Load memory files
const memoryPrompt = await loadMemoryPrompt();
// 5. Append to history
this.mutableMessages.push(...userMessages);
// 6. Core query loop
for await (const event of query({
messages: this.mutableMessages,
system: [...systemPromptParts, memoryPrompt],
tools: this.config.tools,
abortSignal: this.abortController.signal,
})) {
yield event; // streamed to REPL
if (event.type === 'assistant') {
this.mutableMessages.push(event.message);
this.totalUsage = accumulateUsage(this.totalUsage, event.usage);
}
}
// 7. Persist transcript
recordTranscript(this.mutableMessages);
flushSessionStorage();
}Key details:
System prompts are rebuilt every turn, ensuring immediate effect of changes to CLAUDE.md or tool definitions, at the cost of I/O. mutableMessages accumulates across turns, directly causing token growth in long sessions.
Events are yielded before being added to history, enabling streaming output.
04 Session State Comparison: Cross‑Turn vs Non‑Persistent
Fields that survive across turns: mutableMessages – full conversation record (persisted) totalUsage – cumulative token usage for budgeting (persisted) readFileState – file‑read cache to avoid repeated reads (persisted) permissionDenials – SDK‑rejected operations (persisted) discoveredSkillNames – reset each submitMessage (single‑turn) abortController – recreated after a user abort (single‑turn) systemPromptParts – rebuilt every turn (not persisted) prompt – the user input for a single call (not persisted)
The principle is to persist internally generated state (history, usage, cache) while leaving externally supplied data (system prompt, tool definitions) uncached.
05 SDK Mode vs REPL Mode: Two Faces of the Same Engine
Both interactive REPL and programmatic SDK usage share the same QueryEngine codebase. In SDK mode the engine returns a structured result after the query loop finishes; in REPL mode it streams events.
// SDK mode
if (options.sdkMode) {
return {
messages: this.mutableMessages,
usage: this.totalUsage,
cost: this.getTotalCost(),
};
}
// Fast mode example (lightweight model override)
const fastModeState = getFastModeState();
if (fastModeState.enabled) {
options.model = fastModeState.fastModel; // affects only this call
}AgentTool (sub‑agent) invokes the engine with sdkMode: true, receives the structured messages, and continues its own workflow.
06 Session Persistence: What recordTranscript Does
After each submitMessage, recordTranscript serialises mutableMessages to a JSONL file under ~/.claude/projects/<project_hash>/. The --continue and --resume flags read this file back into initialMessages, allowing a new engine instance to pick up where the previous session left off.
// Persist after each turn
recordTranscript(this.mutableMessages);
flushSessionStorage();
// Restore on construction
constructor(config: QueryEngineConfig) {
this.mutableMessages = config.initialMessages ?? [];
}The /clear command calls resetMessages(), clearing mutableMessages and writing a fresh empty transcript.
07 Reusing the Pattern in Your Project
A minimal implementation mirrors the same "one instance = one session" contract, with explicit turn counting, budget checks at entry, per‑turn system‑prompt reconstruction, and a reset() method that clears state without destroying the instance.
class QueryEngine {
private messages: Message[] = [];
private totalTokens: TokenUsage = new TokenUsage();
async *submitMessage(prompt: string): AsyncGenerator<Event> {
// 1. Budget guard
if (this.config.maxTurns && this.getTurnCount() >= this.config.maxTurns) {
yield { type: 'budget_exceeded' };
return;
}
// 2. Rebuild system prompt each turn
const system = await this.buildSystemPrompt();
// 3. Append user message
this.messages.push({ role: 'user', content: prompt });
// 4. LLM loop
for await (const event of this.callLLM(system, this.messages)) {
yield event;
if (event.type === 'assistant') {
this.messages.push(event.message);
this.totalTokens.add(event.usage);
}
}
// 5. Persist
await this.persistTranscript();
}
reset(): void {
this.messages = [];
this.totalTokens = new TokenUsage();
}
getTurnCount(): number {
return this.messages.filter(m => m.role === 'user').length;
}
}08 Design Insights: Four Engineering Decisions
Dependency injection, not centralized state: The engine only holds session‑level state; application‑level state is accessed via injected getAppState/setAppState, allowing safe reuse by sub‑agents.
System prompts are never cached; message history always is: External files (e.g., CLAUDE.md) are rebuilt each turn, while internally generated data is persisted.
Single codebase serves both SDK and REPL modes: No duplicated implementations; AgentTool benefits from this shared engine.
Budget check as an entry guard: Per‑turn validation avoids background monitoring or timers.
09 Critical View: Three Technical Debts
Unbounded growth of mutableMessages : Long sessions can cause memory pressure; auto‑compaction is planned for a future article.
Per‑turn system‑prompt reconstruction incurs I/O: Large projects may suffer noticeable latency; incremental updates or file watching could mitigate this.
Abort controller recreation can race: If a previous turn’s async work hasn’t fully cleaned up, a new controller may coexist, leading to race conditions under rapid user aborts.
10 Summary
The core contract is that a QueryEngine instance equals a conversation thread. State that the engine itself creates (history, token usage, file cache) persists across turns, while externally supplied state (system prompts, tool definitions) is rebuilt each turn. The same engine powers both interactive REPL and programmatic SDK usage, enabling lightweight sub‑agent sandboxes. Budget enforcement is performed as an entry guard, keeping overhead minimal. The most pressing technical debt is the unlimited growth of mutableMessages, which will be addressed in the next installment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
