Artificial Intelligence 21 min read

How Claude Code’s Speculation Engine Lets AI Finish Your Code Before You Hit Tab

The article dissects Claude Code’s Speculation system, showing how an AI sub‑agent predicts user intent, runs a full edit‑test pipeline in an overlay filesystem, filters results through twelve safety layers, and only commits changes when the user confirms, effectively turning speculative execution into a safe performance boost.

Shuge Unlimited

Apr 5, 2026

How Claude Code’s Speculation Engine Lets AI Finish Your Code Before You Hit Tab

What Speculation Does

The Speculation module forks a child agent that silently completes the next coding step—reading files, editing code, running commands, and passing tests—before the user presses Tab to accept the suggestion.

Phase 1 – Predicting the Next Input

Implemented in promptSuggestion.ts (523 lines), the system uses the same large model as the main session because the canUseTool check forces a deny if the model changes, preserving the shared Prompt Cache. This ensures the sub‑agent has full context (coding style, project structure, recent file changes). A small model would lose this context and dramatically drop prediction accuracy.

SUGGESTION_PROMPT Design

THE TEST: Would they think "I was just about to type that"?
Format: 2-12 words, match the user’s style. Or nothing.

The prompt asks the model to generate a suggestion that the user would feel they were about to type; if confidence is low, the model must output nothing.

12‑Layer Filtering System

done – filter "done"

meta_text – filter generic meta messages like "nothing found"

meta_wrapped – filter wrapped placeholders

error_message – filter API error texts

prefixed_label – filter "word: text" patterns

too_few_words – reject <2 words (except commands/yes/no/ok)

too_many_words – reject >12 words

too_long – reject ≥100 characters

multiple_sentences – reject multi‑sentence output

has_formatting – reject line breaks or bold markup

evaluative – filter praise like "looks good"

claude_voice – filter Claude‑style phrasing

This prevents the agent from “talking to itself” or inserting irrelevant commentary.

Cache‑Cold Suppression

If the parent session has more than MAX_PARENT_UNCACHED_TOKENS = 10000 uncached tokens, the system skips suggestion generation to avoid high latency and stale results.

Phase 2 – Overlay Filesystem & Isolated Execution

The speculation code ( speculation.ts, 991 lines) writes to an overlay directory /tmp/claude/speculation/<pid>/<id>/. It follows a classic copy‑on‑write (COW) pattern:

# Read flow
if file modified in overlay → read overlay
else → read original file

# Write flow
copy original → overlay
modify overlay only

# Accept → copyOverlayToMain (merge)
# Reject → safeRemoveOverlay (delete)

This guarantees that no real files are altered until the user confirms, providing both safety and consistency across multiple file reads/writes.

State Isolation

forkedAgent.ts

creates a sub‑agent context where: readFileState is shallow‑cloned from the parent (cached reads) abortController is a new controller setAppState and mutation callbacks are no‑ops, preventing UI side‑effects.

Tool Permission Matrix

Write tools : Edit, Write, NotebookEdit – check permission then write to overlay.

Safe read tools : Read, Glob, Grep, ToolSearch, LSP, TaskGet, TaskList – normal execution.

Bash commands : read‑only commands allowed; non‑read‑only commands set a bash boundary and abort.

Other tools : denied outright.

Any tool not on the whitelist triggers a denied_tool boundary, enforcing a strict security perimeter.

Boundary Types

complete

– all tool calls finish; speculation can be merged. bash – a non‑read‑only Bash command aborts execution. edit – permission‑restricted edit aborts. denied_tool – unwhitelisted tool aborts.

If the boundary is not complete, the system truncates the unfinished tail and falls back to normal completion.

Pipeline – Continuous Speculation

When a round ends with boundary === complete, the system calls generatePipelinedSuggestion to fork another agent for the next predicted step. The pipeline looks like:

Round 1: predict "edit A" → execute → boundary=complete
↓
Round 2 (pipeline): predict "edit B" → execute
↓
Round 3 (pipeline): predict "run tests" → execute

The user’s Tab acceptance of round 1 automatically promotes round 2’s suggestion, and so on.

Accept vs. Reject Flows

Accept (5 steps)

// speculation.ts → acceptSpeculation
Step 1: copyOverlayToMain (merge overlay)
Step 2: prepareMessagesForInjection (remove thinking blocks, failed tools, abort messages)
Step 3: inject user + speculated messages into main dialog
Step 4: merge readFileState cache into parent
Step 5: handle boundary
   – if !== complete → truncate tail, trigger normal query
   – if === complete → promote pipelinedSuggestion and start new speculation

Reject

The reject path simply runs abort → safeRemoveOverlay → reset state, leaving no file changes, no dialog pollution, and no cache impact.

Time‑Saved Calculation

timeSavedMs = min(acceptedAt, boundary.completedAt) - startTime

The formula counts only the interval until the user actually accepts or until speculation finishes, whichever is earlier.

Co‑existence with Prompt Cache

Forked agents must share the parent’s Prompt Cache; otherwise latency would erase any benefit. The CacheSafeParams type (in forkedAgent.ts) lists parameters that may be overridden without breaking the cache key: systemPrompt, userContext, systemContext, toolUseContext, forkContextMessages. All other API parameters must remain identical, enforced by the rule “DO NOT override any API parameter that differs from the parent request”.

PR #18143 Lesson

Setting effort:'low' caused a 45× increase in cache writes and dropped hit‑rate from 92.7 % to 61 %.

Changing any API parameter invalidates the cache key, forcing every forked sub‑agent to rewrite the full context cache, turning a cheap optimization into a costly operation.

Key Takeaways

Overlay Filesystem is the foundation; copy‑on‑write isolation makes speculative execution safe.

The design mirrors optimistic concurrency control in databases—assume acceptance, then verify.

Shared Prompt Cache is essential for cost‑effective speculation; any parameter drift kills the cache.

“Or nothing” philosophy and the 12‑layer filter ensure the system never distracts the user with irrelevant output.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agent Pipeline Copy-on-Write Speculation Overlay Filesystem Claude Code prompt cache

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

What Speculation Does

Phase 1 – Predicting the Next Input

SUGGESTION_PROMPT Design

12‑Layer Filtering System

Cache‑Cold Suppression

Phase 2 – Overlay Filesystem & Isolated Execution

State Isolation

Tool Permission Matrix

Boundary Types

Pipeline – Continuous Speculation

Accept vs. Reject Flows

Accept (5 steps)

Reject

Time‑Saved Calculation

Co‑existence with Prompt Cache

PR #18143 Lesson

Key Takeaways

Shuge Unlimited

How this landed with the community

Was this worth your time?

0 Comments

Phase 1 – Predicting the Next Input

Phase 2 – Overlay Filesystem & Isolated Execution

PR #18143 Lesson