Beyond mem0: How YC CEO’s Open‑Source AI Memory Engine Uses Regex Instead of LLMs to Power a Knowledge Graph
The article dissects GBrain, an open‑source AI memory engine from Y Combinator’s Garry Tan, showing how a dual‑engine contract, zero‑LLM regex‑based knowledge‑graph extraction, and a layered hybrid retrieval pipeline boost P@5 from ~18 to 49.1 while detailing engineering trade‑offs, batch‑write work‑arounds, weighting constants, and reliability mechanisms.
Overview
GBrain is an MIT‑licensed personal‑knowledge store built on Postgres. Its package.json describes it as a “Postgres‑native personal knowledge brain with hybrid RAG search”.
Benchmark results
Pure BM25 keywords – P@5 ≈ 18, R@5 ≈ 75
Pure vector RAG – P@5 ≈ 18, R@5 ≈ 80
Hybrid search + RRF (no graph) – P@5 ≈ 18, R@5 ≈ 85
Full stack (default config) – P@5 49.1 , R@5 97.9
The graph layer is the primary source of the jump from ~18 to 49.1 P@5.
1. Contract‑first dual‑engine design
Two database engines implement the BrainEngine contract ( src/core/engine.ts, 2 145 lines, ~47 operations):
PGLite engine ( src/core/pglite-engine.ts) – Postgres 17 compiled to WebAssembly, zero‑config, single‑process, suitable for brains < 50 K pages.
Postgres engine ( src/core/postgres-engine.ts) – native Postgres + pgvector, works with Supabase or self‑hosted deployments for larger or shared brains.
Both expose a discriminated kind field ("pglite" | "postgres") instead of using instanceof. This avoids prototype‑chain breakage across dynamic imports and enables compile‑time exhaustiveness checks. The factory ( src/core/engine-factory.ts) selects the implementation based on kind, so commands such as gbrain init --pglite and remote MCP calls share the same code path.
2. Zero‑LLM knowledge graph construction
All graph extraction is performed with regular expressions and verb‑matching; no LLM calls are made. Core logic resides in src/core/link-extraction.ts (line 1 229).
4‑pass extraction chain
extractEntityRefs(content)runs four passes in strict priority:
Markdown links – matches [Name](dir/slug) with optional relative paths.
Qualified wikilink (v0.17+) – matches [[source-id:DIR_PATTERN/…]], e.g. [[wiki:topics/ai]].
Unqualified / generic wikilinks – matches Obsidian‑style links such as [[people/alice|Alice]] or bare names [[bare-name]]. Unresolved slugs are marked needsResolution: true and later handled by SlugResolver.
Frontmatter – maps specific YAML fields directly to typed edges.
A whitelist regex DIR_PATTERN restricts directories to people|companies|meetings|concepts|deal|civic|project. Functions stripCodeBlocks() and maskRanges() remove code fragments and prevent overlapping matches.
Edge‑type inference
The function inferLinkType ( link-extraction.ts:694) determines the edge type using a priority list of verb regexes:
if (FOUNDED_RE.test(context)) return 'founded';
if (INVESTED_RE.test(context)) return 'invested_in';
if (ADVISES_RE.test(context)) return 'advises';
if (WORKS_AT_RE.test(context)) return 'works_at';
return 'mentions';Verb regexes cover dozens of expressions, for example: WORKS_AT_RE matches >60 forms such as “CEO of”, “engineer at”, “VP at”. INVESTED_RE matches “invested in”, “led the seed”, “early investor”. FOUNDED_RE matches “founded”, “co‑founded”, “founder of”.
If no verb matches, the edge defaults to mentions.
Frontmatter‑derived edges
person → company (field company/companies) creates an outgoing works_at edge.
person → company (field founded) creates an outgoing founded edge.
company → person (field key_people) creates an incoming works_at edge.
deal → person (field investors) creates an incoming invested_in edge.
meeting → person (field attendees) creates an incoming attended edge.
The direction logic ensures that a frontmatter entry like key_people: [Pedro] on company/stripe yields people/pedro → companies/stripe (works_at), not the reverse.
Batch insertion to bypass Postgres parameter limit
Each put_page triggers addLinksBatch, which inserts edges via a JSONB‑based bulk statement, avoiding the 65 535‑parameter ceiling and fixing array‑literal crashes (see issue #1861).
INSERT ... SELECT FROM jsonb_to_recordset($1::jsonb->'rows')
JOIN pages ON CONFLICT DO NOTHING RETURNING 1The trade‑off of a zero‑LLM approach is coverage: patterns not covered by the predefined verbs fall back to mentions. The README claims a 17 K‑page brain can build the full graph in seconds.
3. Four‑layer retrieval pipeline
The hybrid pipeline ( src/core/search/hybrid.ts, line 1968) consists of eight stages:
intent classify
↓
expansion (optional)
↓
hybrid search
├─ vector (HNSW on pgvector)
├─ keyword (BM25)
├─ relational (typed‑edge recall)
├─ source‑aware re‑rank
└─ RRF fusion → top 30
↓
graph augment (type‑aware traversal)
↓
reranker (zerank‑2 cross‑encoder)
↓
token‑budget enforcement
↓
deduplication (slug‑based)
↓
resultsThe relational layer ( traverseGraph(slug, depth, opts)) raises P@5 from ~18 to 49.1 by traversing typed edges (e.g., following invested_in then works_at to discover founders of invested companies). RRF fusion ( RRF_K = 60) merges rankings without score normalization.
Hard‑coded weighting constants
COMPILED_TRUTH_BOOST = 2.0; // double weight for body chunks after RRF
BACKLINK_BOOST_COEF = 0.05; // logarithmic boost for backlinks
ADJACENCY_BOOST = 1.05; // boost for locally hubbed results
CROSS_SOURCE_BOOST = 1.10; // cross‑source verification boost
SESSION_DEMOTE = 0.95; // demote multiple results from same session CROSS_SOURCE_BOOSTincreases weight when an entity appears in multiple independent sources, acting as a cross‑verification heuristic.
4. Orthogonal organization: Brain ⟂ Source
A “brain” is a physical database instance (PGLite file or Postgres instance) selected via --brain <id> or a .gbrain-mount dotfile. Each page stores a source_id, allowing the same slug (e.g., people/alice) to exist in different sources without conflict.
Configuration values are resolved through six progressive layers: CLI flag → environment variable → dotfile → path‑based inference → default fallback.
5. Reliability engineering
Batch‑retry self‑healing
src/core/retry.tsimplements decorrelated jitter with three default retries for bulk operations ( addLinksBatch, addTimelineEntriesBatch, upsertChunks), preventing thundering‑herd spikes.
Background‑work registry
Five background sinks are registered in order and drained via finishCliTeardown to ensure fire‑and‑forget writes are not lost on CLI exit:
facts/queue
last‑retrieved
search/hybrid cache
eval‑capture
volunteer‑events
Fail‑closed trust boundary
The flag ctx.remote in src/core/operations.ts marks remote calls; only ctx.remote === false is trusted. Privileged actions are guarded by this fail‑closed check, mitigating risks from leaked OAuth tokens.
PGLite advisory lock
Because PGLite is single‑process, src/core/pglite-lock.ts provides an advisory lock using PID liveness checks and heartbeats to avoid WAL corruption and PID‑reuse errors.
Limitations
The zero‑LLM graph relies on predefined verb patterns; coverage may drop on domains with different linguistic conventions. PGLite permits only a single writer, requiring service pause for large sync operations. Frontmatter‑derived edges lack provenance columns, limiting safe deletions. Graph‑traversal truncation detection can produce false positives/negatives and is slated for future improvement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shuge Unlimited
Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
