Artificial Intelligence 7 min read

Boost AI Smarts and Cut Costs with Open‑Source Memory and Compression Tools

The article analyzes why AI chats are costly—repeating context each time—and presents two open‑source projects, mempalace and caveman, that together provide a large‑scale memory system and aggressive token compression, dramatically reducing token usage and expenses while preserving reasoning ability.

Geek Labs

Apr 10, 2026

Boost AI Smarts and Cut Costs with Open‑Source Memory and Compression Tools

Repeatedly restating background information makes AI conversations costly; examples include Claude forgetting after a few days, ChatGPT exceeding its context window, and Copilot requiring full project descriptions each time.

mempalace – a palace‑memory system for AI

Dialogs and project documents are organized into a hierarchy wing → hall → room → closet , where each project, topic, or participant forms a wing that is further subdivided. This structure lets the model locate relevant memories explicitly instead of relying on its own judgment.

AAAK compression protocol

AAAK is a lossless “AI‑specific abbreviation language”. The README reports a 30× compression ratio, allowing roughly three months of conversation to be loaded with about 170 tokens.

Direct paste of all context : exceeds context window, infeasible.

LLM auto‑summary : ~650 K tokens, $507 / year.

mempalace wake‑up : ~170 tokens, $0.70 / year.

The system runs locally, makes no API calls, and does not upload any data.

Benchmark results

On the LongMemEval benchmark mempalace achieves a recall of R@5 96.6 % . The optional rerank module can raise this to 100 %.

Installation

pip install mempalace
claude mcp add mempalace -- python -m mempalace.mcp_server

After installation a query such as “What decisions did we make last month about the auth module?” triggers Claude to call mempalace’s search tool and retrieve the answer from stored memory.

caveman – reducing AI verbosity

caveman forces the model to use a “caveman‑style” phrasing, cutting output token consumption by 65 %–75 % while preserving technical content.

Token‑saving benchmark

Explain React re‑render bug: 1180 → 159 tokens (87 % saved)

Fix auth middleware token expiry check: 704 → 121 tokens (83 % saved)

Configure PostgreSQL connection pool: 2347 → 380 tokens (84 % saved)

Explain git rebase vs merge: 702 → 292 tokens (58 % saved)

Implement React error boundary: 3454 → 456 tokens (87 % saved)

Average : 1214 → 294 tokens (65 % saved)

The average 65 % token reduction translates to roughly two‑thirds lower cost.

Preserving reasoning

caveman only compresses output tokens; thinking and reasoning tokens remain unchanged. Code blocks, technical terms, and error messages are kept intact, while filler phrases such as “Sure, let me take a look at that problem” are removed.

A March 2026 paper reports that limiting large‑model outputs to concise answers improves accuracy by 26 percentage points.

Adjustable compression levels

Lite : remove interjections, keep grammar.

Full (default): full caveman mode.

Ultra : extreme compression, telegram‑style.

caveman‑compress – shrinking memory files

claude‑md‑preferences.md: 706 → 285 tokens (59.6 % saved)

claude‑md‑project.md: 1122 → 687 tokens (38.8 % saved)

todo‑list.md: 627 → 388 tokens (38.1 % saved)

Average : 898 → 494 tokens (45 % saved)

Combined effect

mempalace (input side) addresses the “AI forgets project history” problem with palace‑memory organization and AAAK compression. caveman (output side) addresses the “AI is too verbose” problem with caveman phrasing and waste‑word removal. Both tools run locally, require no API calls, and together provide a large‑scale memory palace and a lean output pipeline, dramatically lowering token costs while keeping reasoning intact.

open-source token compression AI memory LLM efficiency caveman mempalace

Written by

Geek Labs

Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.