Artificial Intelligence 36 min read

Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving

The article dissects Claude Code’s LLM‑driven Grep search, showing how multi‑round tool calls replace static vector‑based RAG, presents ripgrep performance benchmarks, compares Claude Code with Cursor and Codex, and argues that zero‑index search is optimal for local code bases while larger projects still need indexing.

dbaplus Community

May 17, 2026

Why Grep Is Replacing Vector Indexes: RAG Isn’t Dead, It’s Evolving

1. Claude Code’s Grep‑only retrieval

Boris Cherny repeatedly states that Claude Code does not use RAG, embeddings, or a vector database. The product relies on LLM‑driven grep (Unix text search) to locate code. In a X/Twitter post he notes that early versions used RAG but the team quickly found that “agentic search generally works better.” The Pragmatic Engineer interview confirms the claim: “Plain glob and grep, driven by the model, beat everything.” Anthropic’s Context Engineering blog confirms that Claude Code invokes a GrepTool and a GlobTool to load code into the LLM context, but the exact invocation details are not public.

2. LLM‑driven multi‑round search loop

The core loop works as follows:

Send the user query and the list of available tools to the LLM.

The LLM either returns a textual answer or a tool‑call request.

If a tool is called, execute it, append the result to the conversation history, and invoke the LLM again with the updated context.

The loop ends when the LLM decides the information is sufficient, or when a hard stop occurs (max rounds, budget limit, user interrupt, or permission denial).

Four core tools are available:

GrepTool : wraps ripgrep ( rg) for regular‑expression file‑content search.

GlobTool : performs filename/path pattern matching.

FileReadTool : reads a specific line range from a file via Node.js fs.

AgentTool : launches a child “Explore” agent that can only use Grep, Glob, and Read, isolating its own context.

2.1 GrepTool output modes

files_with_matches (default) : returns only matching file paths; a subsequent Read is usually needed to see code.

content : returns matching lines with surrounding context (e.g., -C 5).

count : returns the number of matches per file.

These modes let the LLM control the amount of information that enters the context window.

2.2 Real‑world four‑round walkthrough

Question: “When the LLM calls GrepTool to track tool usage, how does the bridge record the call?”

Round 1 – Broad search :

Grep({pattern:"GrepTool|tool.*track|tool.*activity", glob:"*.ts"})

returns four files (three under bridge/, one under cli/).

Round 2 – Context view : Switch to content mode on bridge/sessionRunner.ts to see the mapping table showing tool verbs (e.g., Grep → 'Searching').

Round 3 – Full read :

Read({path:"bridge/sessionRunner.ts", startLine:1, endLine:200})

reveals three structures:

Tool‑verb mapping table (18 entries, e.g., Grep → 'Searching').

Summary‑generation function that concatenates verb and target.

Activity parser that extracts JSON events from the session stdout.

Round 4 – Trace usage :

Grep({pattern:"SessionActivity|currentActivity", path:"bridge/", output_mode:"content", "-C":2})

finds three files ( bridge/types.ts, bridge/bridgeMain.ts, bridge/bridgeUI.ts) that together form the full tracking chain: tool execution → activity JSON → bridge main process → UI rendering.

3. Why brute‑force search is fast enough

Claude Code uses ripgrep, a Rust rewrite of GNU grep that respects .gitignore, skips binaries, and runs multi‑threaded with SIMD acceleration. The source line import { ripGrep } from '../../utils/ripgrep.js' confirms this.

Five filtering layers shrink the search space before content matching:

Layer 1: .gitignore pruning<br/>Layer 2: Path restriction (e.g., bridge/)<br/>Layer 3: Glob pattern filter<br/>Layer 4: Binary file detection<br/>Layer 5: Regex content match

Example on a 4,471‑file snapshot:

Original files: 4,471<br/>After .gitignore: 4,471 (no node_modules in snapshot)<br/>Path restriction to bridge/: 32<br/>Glob *.ts: 32<br/>Binary detection: 32<br/>Regex match for "SessionActivity|currentActivity": 3 files

ripgrep optimizations:

SIMD vectorized matching (AVX2 processes 32 bytes per cycle).

Boyer‑Moore skip for fixed strings.

OS page‑cache reuse.

Memory‑mapped I/O (zero‑copy).

Thread‑pool parallelism.

Empirical benchmark on the same 4,500‑file codebase (average of three runs):

TOOL_VERBS (low‑freq)   ripgrep 0.09 s   GNU grep -r 2.55 s   ≈28× faster</code><code>async.*generator (regex)   ripgrep 0.10 s   GNU grep -r 3.30 s   ≈33× faster</code><code>import.*from (high‑freq)   ripgrep 0.10 s   GNU grep -r 2.45 s   ≈25× faster

Both tools scan a similar number of files; the speedup comes from ripgrep’s parallelism and SIMD, not from file‑level pruning. A 250 MB project typically finishes in tens of milliseconds, making an offline index unnecessary.

4. Industry comparison & design philosophy

Cursor uses a classic RAG stack with two indexes: a tree‑sitter‑based semantic index (vectors stored in Turbopuffer) and a trigram inverted index (Instant Grep). The semantic index enables cross‑repo search; the trigram index accelerates exact‑match grep.

Claude Code follows a zero‑index, on‑demand approach: no pre‑built vectors, no offline processing, and all retrieval is performed by the LLM issuing rg commands in real time. The trade‑off is zero startup/maintenance cost versus higher token usage for multi‑round searches.

OpenAI Codex CLI mirrors Claude Code in that it avoids embeddings and vector stores, but it provides a generic shell tool that can run rg, find, cat, etc., leaving parsing of raw output to the model.

Both Claude Code and Codex demonstrate that, for code‑search tasks where identifiers are precise anchors, a Grep‑driven pipeline can outperform embedding‑based RAG. Academic work (GrepRAG, ISSTA ’26) confirms that a single‑round Grep retrieval yields higher exact‑match scores than a vanilla embedding RAG baseline on code‑completion benchmarks.

4.1 Cost‑control mechanisms in Claude Code

Prompt‑cache reuse : identical prefixes across rounds hit the cache, reducing token cost to roughly 10 % of the full price for repeated context.

Auto‑compaction : when the conversation approaches the context limit, the system summarizes older rounds and replaces them with concise abstracts.

Sub‑agent isolation : the Explore sub‑agent processes raw grep results in its own context and returns only a summarized conclusion to the main dialog.

These layers keep token inflation manageable but do not eliminate the fundamental trade‑off: more search rounds increase context size.

4.2 When Grep fails

Milvus engineers criticize Grep‑only retrieval for token bloat, latency, and lack of semantic understanding. Their benchmark shows 14 tool calls, 32.2 k tokens, and 59.3 s to locate a 10‑line fix buried in 500 lines of noise. Their open‑source MCP plugin replaces Grep with vector search, cutting token usage by ~40 %.

5. Conclusion – Is RAG dead?

Code identifiers are naturally Grep‑friendly, and a single‑round Grep already beats embedding RAG on code‑completion tasks.

Local codebases (tens to hundreds of MB) are small enough for ripgrep to scan in sub‑second time.

Agentic loops let the LLM decide what to search next, turning retrieval into an active, iterative process rather than a static pre‑fetch.

For larger repositories or natural‑language QA, vector‑based retrieval remains valuable. The choice of retrieval strategy should be guided by data scale and the need for semantic matching, not by hype.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG LLM Agents Code search Vector Indexing grep ripgrep Claude Code

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.