Artificial Intelligence 20 min read

RAG vs. LLM Wiki vs. GBrain: Which Architecture Best Powers Agent Memory?

The article analyzes why AI agents forget, then compares three memory architectures—RAG, LLM Wiki, and GBrain—detailing their strengths, weaknesses, scalability, latency, compounding knowledge, and autonomy, and offers guidance on choosing the right approach for different use cases.

Spring Full-Stack Practical Cases

May 20, 2026

RAG vs. LLM Wiki vs. GBrain: Which Architecture Best Powers Agent Memory?

1. Introduction

Most agent tutorials overlook that the context window is not persistent memory; it behaves like a whiteboard that is cleared after each session. An agent can hold up to one million tokens, but performance degrades once the stored tokens reach 300‑400 k (about 30‑40% of the limit), and all information disappears when the conversation ends.

2. Three Architecture Comparison

2.1 Architecture 1: RAG – Retriever

RAG wins on scale but loses on depth. It excels when you have tens of thousands of rapidly changing documents and need immediate answers; no other solution matches its throughput. The workflow is simple: embed → store → retrieve → generate. Mature frameworks such as LangChain, LlamaIndex, and dozens of others have standardized this pipeline, so many teams can build it out‑of‑the‑box.

RAG handles large corpora (e.g., a company with 200 k internal documents) and can re‑embed changed documents to keep answers up‑to‑date. However, the fundamental chunk‑splitting problem remains: a 30‑page spec is broken into ~500‑word fragments, causing compliance‑related information to be split across vectors and often missed. RAG also repeats reasoning for each query, cannot learn autonomously, and is purely reactive. Latency accumulates across embedding, vector search, re‑ranking, and context packing—acceptable for a single query but costly for agents that make dozens of tool calls per loop.

2.2 Architecture 2: LLM Wiki – Compiler

LLM Wiki wins on depth but struggles with scale. It is ideal when the source set is under a thousand documents and you want knowledge to compound as each new source is ingested. Karpathy’s gist proposes compiling sources into a persistent, inter‑linked wiki rather than retrieving raw text at query time. The architecture has three layers:

Raw sources (PDFs, articles, bookmarks) – immutable.

Generated markdown wiki (summaries, entity pages, concept definitions) – owned by the LLM.

Configuration file (e.g., CLAUDE.md) that defines naming conventions, cross‑reference rules, and conflict handling.

When a new document is added, the LLM reads existing wiki pages, updates 10‑15 affected pages, adds cross‑references, marks contradictions, and creates new entity pages. Queries then benefit from the accumulated knowledge, producing richer answers that are archived as new wiki pages, further compounding future queries. The approach also includes an automated audit workflow that detects orphan pages, outdated statements, and missing concepts.

Limitations appear beyond a few hundred sources: navigation based on BM25/grep collapses at ~10 k sources, and the per‑ingestion compute cost far exceeds RAG’s embedding cost. The system remains passive—it cannot act on knowledge, detect contradictions, or trigger actions without explicit prompts, which may be problematic for regulated environments.

2.3 Architecture 3: Skill – Operator (GBrain)

GBrain introduces autonomous action. It was built by Y Combinator CEO Garry Tan for personal AI agents (OpenClaw, Hermes, Claude Code) and open‑sourced as a codebase rather than a commercial product. The design keeps the runtime thin (~200 lines of resolver code) while delegating most logic to “skills”—fat markdown files that describe when to trigger, what checks to perform, how to chain with other skills, and quality standards.

Each skill declares its triggers, tools, and write targets, enabling deterministic execution and auditability. Example skill definition:

name: enrich
version: 1.0.0
description: |
  Enrich brain pages with tiered enrichment protocol.
  Creates and updates person/company pages with compiled
  truth, timeline, and cross-links.
triggers:
  - "enrich"
  - "create person page"
  - "update company page"
  - "who is this person"
tools:
  - get_page
  - put_page
  - search
  - add_link
  - add_timeline_entry
mutating: true
writes_to:
  - people/
  - companies/

The resolver routes intents to six skill categories (always‑on, brain‑ops, content ingestion, thinking, task execution, settings). Skills can call deterministic code (SQL, API, file ops) for tasks that should not be left to the LLM, reducing hallucinations. GBrain’s knowledge base currently holds ~18 k pages; it can be backed by Postgres + pgvector for scaling, but the architecture assumes a single expert operator who maintains the skill library.

3. Trade‑offs and Decision Guidance

The choice starts from the agent’s responsibility:

If you need a production‑ready knowledge assistant that can index tens of thousands of frequently changing documents, RAG is the pragmatic path.

If you have a few hundred sources and want the knowledge to compound over time, LLM Wiki provides deeper, linked understanding.

If autonomous action, continuous enrichment, and custom workflows are essential—and you can invest in engineering—full‑skill architectures like GBrain are appropriate.

In practice, many systems combine the three: RAG for large‑scale retrieval, Wiki for compiled knowledge, and skills for action.

4. Future Trends

The three‑way landscape is converging. Upcoming LLM Wiki v2 adds a retrieval layer on top of the compiled wiki, GBrain’s skills are being backed by vector stores, and enterprise platforms (e.g., Neo4j) are building unified knowledge layers that blend graph, vector search, and semantic reasoning. By 2026 the boundary between retrieval, compilation, and action is expected to dissolve into a single knowledge operating system.

For immediate exploration, the article recommends LangChain’s RAG tutorial, Karpathy’s 200‑line LLM Wiki schema, and GBrain’s open‑source repository (read RESOLVER.md and THIN_HARNESS_FAT_SKILLS.md first).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG knowledge retrieval AI Architecture Agent Memory LLM Wiki gbrain

Written by

Spring Full-Stack Practical Cases

Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.