Give Your Notes a Memory Layer with QMD: 3‑Command, 26k‑Star Local Search Engine

QMD is an open‑source, MIT‑licensed local search engine written in TypeScript that combines BM25, vector embeddings via a GGUF model, and an LLM reranker, allowing natural‑language queries over thousands of markdown files without network calls, and can be installed with just three commands.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Give Your Notes a Memory Layer with QMD: 3‑Command, 26k‑Star Local Search Engine

Problem

Notes, documents, and meeting minutes scattered across local directories are hard to locate with Spotlight (which only matches filenames and coarse full‑text) or grep (which requires exact keywords). Users need natural‑language queries such as “Where is the discussion about rate limiting?” or “Which document mentions connection pool timeout?”

QMD Overview

QMD is a TypeScript, MIT‑licensed local search engine (26 k GitHub stars). It provides natural‑language search over markdown repositories, Obsidian vaults, and meeting records using only three commands and runs entirely on the local CPU without API keys.

Three‑Layer Hybrid Architecture

First layer – BM25 full‑text search : SQLite FTS5 runs the BM25 algorithm to retrieve exact keyword matches. Example: querying "connection pool" returns only documents containing that exact phrase.

Second layer – Vector semantic search : Documents are encoded with the GGUF model embeddinggemma-300M-Q8_0 (≈300 MB) and stored in SQLite’s vec0 extension. This layer captures synonyms and conceptual similarity, so a search for auth flow also matches authentication pipeline and login sequence.

Third layer – LLM rerank : After BM25 and vector search produce candidates, a 600 MB qwen3‑reranker model scores each result from 0 to 10 based on document‑query intent matching.

The three result sets are merged with Reciprocal Rank Fusion (RRF). The original query weight is doubled to protect exact matches. Top 1‑3 results favor BM25 (75 % retrieval / 25 % reranker); results beyond rank 11 shift to 40 % retrieval / 60 % reranker.

Installation – Three Commands

npminstall-g@tobilu/qmd
qmdcollectionadd~/notes--namenotes
qmdcollectionadd~/Documents/meetings--namemeetings
qmdquery"rate limiting 怎么设计的"

The first run automatically downloads three GGUF models from HuggingFace: embeddinggemma-300M (≈300 MB) qwen3‑reranker‑0.6b (≈640 MB) qmd‑query‑expansion‑1.7B (≈1.1 GB)

All models total about 2 GB and are cached in ~/.cache/qmd/models/. Subsequent searches follow these steps:

Expand the query into three versions (original ×2 plus one LLM‑generated variant).

Run BM25 and vector search for each version in parallel.

Collect the top 30 candidates and merge them with RRF.

Score the merged list with the LLM reranker.

Output the final list using position‑aware blending.

A single search on an M4 MacBook takes roughly 200‑500 ms.

MCP Server Integration with Claude Desktop

QMD includes an MCP Server that can be referenced in Claude Desktop’s configuration, allowing Claude to query the local note store directly.

{
  "mcpServers": {
    "qmd": {
      "command": "npx",
      "args": ["@tobilu/qmd", "mcp"]
    }
  }
}

Claude can then ask, for example, “Which notes did I write about WebAssembly?” and receive answers based on QMD’s search results, giving agents access to private knowledge bases without uploading data to third‑party servers.

Chinese Language Support

The default embeddinggemma-300M model is optimized for English, so Chinese, Japanese, or Korean documents perform poorly. The recommended replacement is Qwen3‑Embedding‑0.6B, which supports 119 languages and ranks highly on the MTEB leaderboard. After setting the environment variable QMD_EMBED_MODEL, the entire corpus must be re‑encoded because vectors from different models are incompatible.

Why QMD Beats Spotlight, grep, and Commercial Solutions

Spotlight : only filename and coarse full‑text matching; no semantic understanding; results sorted by time or name.

grep / ripgrep : requires exact keywords; no synonym handling; no ranking; no natural‑language queries.

Obsidian built‑in search : limited to the vault; basic keyword matching plus graph links; no semantic search.

Commercial SaaS (Notion AI, Mem, Reflect) : data is sent to external servers, requires a subscription, and depends on network connectivity, raising privacy concerns.

QMD’s positioning is fully local, semantic‑aware, cross‑directory, and open‑source, ideal for users with thousands of markdown files who want natural‑language search without exposing their data.

SDK for Custom Applications

import { QMDStore } from '@tobilu/qmd';

const store = await QMDStore.create({ dbPath: './my-search.db' });
await store.addCollection({ path: '~/notes', name: 'notes' });
await store.embed();

const results = await store.search({
  query: "authentication flow",
  limit: 5
});

The SDK requires an explicit dbPath to avoid silently creating databases.

Blending Rationale: Query Expansion, Top‑Rank Bonus, Position‑Aware Weighting

Query Expansion : the original query weight is doubled and an LLM‑generated variant is added. This protects exact matches; a quoted query like "connection pool" will not be diluted by the expanded version.

Top‑Rank Bonus : documents that rank first in any individual search path receive an additional RRF score of +0.05, ranks 2‑3 receive +0.02. This compensates for RRF’s tendency to lower a document that is top in one path but lower in others.

Position‑Aware Blending : for the top 3 results, the system trusts retrieval (75 %) more than the reranker (25 %). After rank 11, the trust shifts to 40 % retrieval / 60 % reranker because the reranker can better discriminate ambiguous candidates. The README notes that pure RRF can dilute exact matches when expanded queries diverge, so these weighted strategies are intentional.

Who Should Use QMD

Suitable :

Hundreds to thousands of markdown notes, meeting minutes, or documentation.

Predominantly textual content (code search has dedicated tools).

Desire natural‑language queries instead of regex.

Privacy‑conscious users who do not want data sent to third parties.

Willingness to download ~2 GB of models on first install.

Not Ideal :

Fewer than 100 notes (Spotlight or grep suffice).

Primary use case is code search (AST‑based tools are better).

Need sub‑millisecond response times (QMD averages 200‑500 ms per query).

Very limited device resources (the three models occupy ~1 GB RAM).

Author Background

QMD was created by Tobi Lütke, founder and CEO of Shopify. The project lives in his personal repository (created Dec 2025, last push Jun 1 2026) and follows a “stable over flashy” philosophy: using proven technologies like BM25, RRF, GGUF models, and SQLite rather than experimental graph‑RAG or multimodal embeddings.

Repository: https://github.com/tobi/qmd

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TypeScriptBM25local searchvector embeddingsqmdLLM rerank
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.