How a 9K‑Star MCP Server Lets Claude Code Scan Millions of Lines in Milliseconds

The codebase-memory-mcp tool builds a tree‑sitter‑based knowledge graph of a codebase, enabling sub‑millisecond queries, 120× token savings, zero‑dependency deployment, cross‑agent sharing, and reproducible benchmarks that show higher answer quality and far lower resource usage than traditional file‑by‑file grep approaches.

Code Mala Tang
Code Mala Tang
Code Mala Tang
How a 9K‑Star MCP Server Lets Claude Code Scan Millions of Lines in Milliseconds

Performance Overview

Linux kernel 28 M lines, 75 000 files. Traditional Claude Code reads hundreds of files, consuming ~400 000 tokens. codebase-memory-mcp indexes the entire codebase in 3 minutes, answers queries in <1 ms, and uses only 3 400 tokens – a 120× token saving because it no longer reads files.

Benchmark on 31 real projects (from a few thousand to hundreds of thousands of lines) shows 83 % answer quality versus 76 % for a file‑by‑file baseline, token consumption 3 400 vs 34 000 (10× less), and tool‑call count 2.3 vs 4.8 (half).

Zero‑Dependency Deployment

Installation is a single command:

curl -fsSL https://raw.githubusercontent.com/.../install.sh | bash

It downloads a 7 MB static binary with no runtime, Docker, or external services. The binary bundles tree‑sitter grammars for 158 languages, an embedded SQLite database, and the Nomic‑embed‑code (768‑dim int8) vector model, all compiled in C.

Knowledge‑Graph Architecture

On first run the tool parses the entire codebase with tree‑sitter, extracts functions, classes, call relations, HTTP routes, imports, data flows, etc., and stores them in a graph where query complexity is O(edges) versus O(files × lines) for grep – a magnitude advantage.

The graph includes edge types such as DEFINES (function definition location), CALLS (cross‑file call relations), IMPORTS (module dependencies), HTTP_CALLS (frontend fetch to backend routes), IMPLEMENTS / INHERITS, DATA_FLOWS, and EMITS / LISTENS_ON. These edges are extracted from the tree‑sitter AST, not from regex.

A lightweight hybrid LSP provides type inference for nine languages, enabling parameter binding, return‑type inference, generic substitution, JSX component resolution, and trait/extension parsing.

Cross‑Agent Sharing

After indexing, a compressed SQLite artifact ( .codebase-memory/graph.db.zst) is written. Teams can commit this file; when a teammate clones the repo, the tool detects the artifact, decompresses it in about 2 seconds, and performs incremental indexing only on local diffs (≈5 seconds), reducing a full 20‑minute index of a 500 k‑line project to 7 seconds.

The same graph is reused by multiple agents (Claude Code, Codex CLI, Cursor, etc.) via the MCP protocol, eliminating duplicate indexes. In contrast, tools like aide’s repo‑map regenerate per session, and graphify’s JSON files lack compression and merge safety.

Research Backing

The project is accompanied by an arXiv preprint (Codebase‑Memory: Tree‑Sitter‑Based Knowledge Graphs for LLM Code Exploration via MCP, arXiv:2603.27277) that releases code and benchmark scripts for reproducibility.

Security credentials such as SLSA 3 compliance, OpenSSF Scorecard, VirusTotal badges, code signing, checksums, and build provenance are documented in the README, ensuring the tool meets production‑grade standards.

Limitations and Outlook

An optional --ui flag launches a 3D interactive graph visualizer at localhost:9749. While visually impressive, the author found limited practical use in daily workflows.

Open questions remain about whether structured code knowledge will drive new "graph‑friendly" coding styles or enable dead‑code detection as a CI standard.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceLLMOpen Sourceknowledge graphcode indexingcoding agents
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.