How a 9K‑Star MCP Server Lets Claude Code Scan Millions of Lines in Milliseconds
The codebase-memory-mcp tool builds a tree‑sitter‑based knowledge graph of a codebase, enabling sub‑millisecond queries, 120× token savings, zero‑dependency deployment, cross‑agent sharing, and reproducible benchmarks that show higher answer quality and far lower resource usage than traditional file‑by‑file grep approaches.
Performance Overview
Linux kernel 28 M lines, 75 000 files. Traditional Claude Code reads hundreds of files, consuming ~400 000 tokens. codebase-memory-mcp indexes the entire codebase in 3 minutes, answers queries in <1 ms, and uses only 3 400 tokens – a 120× token saving because it no longer reads files.
Benchmark on 31 real projects (from a few thousand to hundreds of thousands of lines) shows 83 % answer quality versus 76 % for a file‑by‑file baseline, token consumption 3 400 vs 34 000 (10× less), and tool‑call count 2.3 vs 4.8 (half).
Zero‑Dependency Deployment
Installation is a single command:
curl -fsSL https://raw.githubusercontent.com/.../install.sh | bashIt downloads a 7 MB static binary with no runtime, Docker, or external services. The binary bundles tree‑sitter grammars for 158 languages, an embedded SQLite database, and the Nomic‑embed‑code (768‑dim int8) vector model, all compiled in C.
Knowledge‑Graph Architecture
On first run the tool parses the entire codebase with tree‑sitter, extracts functions, classes, call relations, HTTP routes, imports, data flows, etc., and stores them in a graph where query complexity is O(edges) versus O(files × lines) for grep – a magnitude advantage.
The graph includes edge types such as DEFINES (function definition location), CALLS (cross‑file call relations), IMPORTS (module dependencies), HTTP_CALLS (frontend fetch to backend routes), IMPLEMENTS / INHERITS, DATA_FLOWS, and EMITS / LISTENS_ON. These edges are extracted from the tree‑sitter AST, not from regex.
A lightweight hybrid LSP provides type inference for nine languages, enabling parameter binding, return‑type inference, generic substitution, JSX component resolution, and trait/extension parsing.
Cross‑Agent Sharing
After indexing, a compressed SQLite artifact ( .codebase-memory/graph.db.zst) is written. Teams can commit this file; when a teammate clones the repo, the tool detects the artifact, decompresses it in about 2 seconds, and performs incremental indexing only on local diffs (≈5 seconds), reducing a full 20‑minute index of a 500 k‑line project to 7 seconds.
The same graph is reused by multiple agents (Claude Code, Codex CLI, Cursor, etc.) via the MCP protocol, eliminating duplicate indexes. In contrast, tools like aide’s repo‑map regenerate per session, and graphify’s JSON files lack compression and merge safety.
Research Backing
The project is accompanied by an arXiv preprint (Codebase‑Memory: Tree‑Sitter‑Based Knowledge Graphs for LLM Code Exploration via MCP, arXiv:2603.27277) that releases code and benchmark scripts for reproducibility.
Security credentials such as SLSA 3 compliance, OpenSSF Scorecard, VirusTotal badges, code signing, checksums, and build provenance are documented in the README, ensuring the tool meets production‑grade standards.
Limitations and Outlook
An optional --ui flag launches a 3D interactive graph visualizer at localhost:9749. While visually impressive, the author found limited practical use in daily workflows.
Open questions remain about whether structured code knowledge will drive new "graph‑friendly" coding styles or enable dead‑code detection as a CI standard.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
