How Claude Code’s Speculation Engine Lets AI Finish Your Code Before You Hit Tab
The article dissects Claude Code’s Speculation system, showing how an AI sub‑agent predicts user intent, runs a full edit‑test pipeline in an overlay filesystem, filters results through twelve safety layers, and only commits changes when the user confirms, effectively turning speculative execution into a safe performance boost.
What Speculation Does
The Speculation module forks a child agent that silently completes the next coding step—reading files, editing code, running commands, and passing tests—before the user presses Tab to accept the suggestion.
Phase 1 – Predicting the Next Input
Implemented in promptSuggestion.ts (523 lines), the system uses the same large model as the main session because the canUseTool check forces a deny if the model changes, preserving the shared Prompt Cache. This ensures the sub‑agent has full context (coding style, project structure, recent file changes). A small model would lose this context and dramatically drop prediction accuracy.
SUGGESTION_PROMPT Design
THE TEST: Would they think "I was just about to type that"?
Format: 2-12 words, match the user’s style. Or nothing.The prompt asks the model to generate a suggestion that the user would feel they were about to type; if confidence is low, the model must output nothing.
12‑Layer Filtering System
done – filter "done"
meta_text – filter generic meta messages like "nothing found"
meta_wrapped – filter wrapped placeholders
error_message – filter API error texts
prefixed_label – filter "word: text" patterns
too_few_words – reject <2 words (except commands/yes/no/ok)
too_many_words – reject >12 words
too_long – reject ≥100 characters
multiple_sentences – reject multi‑sentence output
has_formatting – reject line breaks or bold markup
evaluative – filter praise like "looks good"
claude_voice – filter Claude‑style phrasing
This prevents the agent from “talking to itself” or inserting irrelevant commentary.
Cache‑Cold Suppression
If the parent session has more than MAX_PARENT_UNCACHED_TOKENS = 10000 uncached tokens, the system skips suggestion generation to avoid high latency and stale results.
Phase 2 – Overlay Filesystem & Isolated Execution
The speculation code ( speculation.ts, 991 lines) writes to an overlay directory /tmp/claude/speculation/<pid>/<id>/. It follows a classic copy‑on‑write (COW) pattern:
# Read flow
if file modified in overlay → read overlay
else → read original file
# Write flow
copy original → overlay
modify overlay only
# Accept → copyOverlayToMain (merge)
# Reject → safeRemoveOverlay (delete)This guarantees that no real files are altered until the user confirms, providing both safety and consistency across multiple file reads/writes.
State Isolation
forkedAgent.tscreates a sub‑agent context where: readFileState is shallow‑cloned from the parent (cached reads) abortController is a new controller setAppState and mutation callbacks are no‑ops, preventing UI side‑effects.
Tool Permission Matrix
Write tools : Edit, Write, NotebookEdit – check permission then write to overlay.
Safe read tools : Read, Glob, Grep, ToolSearch, LSP, TaskGet, TaskList – normal execution.
Bash commands : read‑only commands allowed; non‑read‑only commands set a bash boundary and abort.
Other tools : denied outright.
Any tool not on the whitelist triggers a denied_tool boundary, enforcing a strict security perimeter.
Boundary Types
complete– all tool calls finish; speculation can be merged. bash – a non‑read‑only Bash command aborts execution. edit – permission‑restricted edit aborts. denied_tool – unwhitelisted tool aborts.
If the boundary is not complete, the system truncates the unfinished tail and falls back to normal completion.
Pipeline – Continuous Speculation
When a round ends with boundary === complete, the system calls generatePipelinedSuggestion to fork another agent for the next predicted step. The pipeline looks like:
Round 1: predict "edit A" → execute → boundary=complete
↓
Round 2 (pipeline): predict "edit B" → execute
↓
Round 3 (pipeline): predict "run tests" → executeThe user’s Tab acceptance of round 1 automatically promotes round 2’s suggestion, and so on.
Accept vs. Reject Flows
Accept (5 steps)
// speculation.ts → acceptSpeculation
Step 1: copyOverlayToMain (merge overlay)
Step 2: prepareMessagesForInjection (remove thinking blocks, failed tools, abort messages)
Step 3: inject user + speculated messages into main dialog
Step 4: merge readFileState cache into parent
Step 5: handle boundary
– if !== complete → truncate tail, trigger normal query
– if === complete → promote pipelinedSuggestion and start new speculationReject
The reject path simply runs abort → safeRemoveOverlay → reset state, leaving no file changes, no dialog pollution, and no cache impact.
Time‑Saved Calculation
timeSavedMs = min(acceptedAt, boundary.completedAt) - startTimeThe formula counts only the interval until the user actually accepts or until speculation finishes, whichever is earlier.
Co‑existence with Prompt Cache
Forked agents must share the parent’s Prompt Cache; otherwise latency would erase any benefit. The CacheSafeParams type (in forkedAgent.ts) lists parameters that may be overridden without breaking the cache key: systemPrompt, userContext, systemContext, toolUseContext, forkContextMessages. All other API parameters must remain identical, enforced by the rule “DO NOT override any API parameter that differs from the parent request”.
PR #18143 Lesson
Setting effort:'low' caused a 45× increase in cache writes and dropped hit‑rate from 92.7 % to 61 %.
Changing any API parameter invalidates the cache key, forcing every forked sub‑agent to rewrite the full context cache, turning a cheap optimization into a costly operation.
Key Takeaways
Overlay Filesystem is the foundation; copy‑on‑write isolation makes speculative execution safe.
The design mirrors optimistic concurrency control in databases—assume acceptance, then verify.
Shared Prompt Cache is essential for cost‑effective speculation; any parameter drift kills the cache.
“Or nothing” philosophy and the 12‑layer filter ensure the system never distracts the user with irrelevant output.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shuge Unlimited
Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
