Tagged articles

Memory compression

14 articles · Page 1 of 1

May 30, 2026 · Artificial Intelligence

Can MIT’s Attention Matching Cut LLM Memory 50× Without Accuracy Loss?

MIT researchers introduce Attention Matching, a latent‑space KV‑cache compaction technique that reduces large‑language‑model memory usage up to 50‑fold with negligible precision loss, outperforming token‑pruning, summarization, and prior compaction methods across benchmarks like QuALITY, LongHealth, and AIME‑2025.

Attention MatchingBenchmarkKV cache

0 likes · 13 min read

Can MIT’s Attention Matching Cut LLM Memory 50× Without Accuracy Loss?

Tencent Cloud Developer

May 26, 2026 · Artificial Intelligence

How TencentDB Agent Memory Cuts Tokens by 61% and Boosts Success Rate 52% with Mermaid Infinite Canvas and Context Offloading

The article presents a technical deep‑dive into TencentDB Agent Memory’s short‑term memory compression, which combines context offloading and a Mermaid‑based infinite canvas to reduce token usage by up to 61 % while improving task success rates by over 50 % across multiple long‑session benchmarks.

AgentContext OffloadingLLM

0 likes · 45 min read

How TencentDB Agent Memory Cuts Tokens by 61% and Boosts Success Rate 52% with Mermaid Infinite Canvas and Context Offloading

AI Waka

May 12, 2026 · Artificial Intelligence

Is 3‑Bit KV Cache the Ultimate Solution? An In‑Depth Evaluation of Google’s TurboQuant

Through ten experiments on three LLMs, this study measures TurboQuant’s 3‑bit KV‑cache compression, revealing that while quality remains strong, speed gains vary by model, memory savings depend on implementation, and attention‑entropy analysis explains why 2‑bit compression degrades performance.

Attention EntropyInference PerformanceKV cache

0 likes · 14 min read

Is 3‑Bit KV Cache the Ultimate Solution? An In‑Depth Evaluation of Google’s TurboQuant

AI Tech Publishing

Apr 29, 2026 · Artificial Intelligence

Why Do AI Agents Forget and Hallucinate? A Complete Guide to KV‑Cache Memory Mechanisms

The article explains that AI agents’ forgetting and hallucinations stem from token‑level attention scores causing key‑value cache eviction before retrieval, then surveys KV‑cache basics, naive growth, streaming‑LLM windowing, SnapKV’s attention‑guided compression, token‑retention studies, Memory Sparse Attention, compares these methods, and discusses practical system pitfalls and design implications.

AI agentsKV cacheMemory Sparse Attention

0 likes · 20 min read

Why Do AI Agents Forget and Hallucinate? A Complete Guide to KV‑Cache Memory Mechanisms

SuanNi

Apr 3, 2026 · Artificial Intelligence

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

The article presents the GEMS (Agent‑Native Multimodal Generation with Memory and Skills) framework, detailing its multi‑agent loop, hierarchical memory compression, on‑demand skill modules, and extensive benchmark results that show a lightweight 6B model surpassing larger proprietary systems on complex image‑generation tasks.

GEMSMemory compressionMultimodal AI

0 likes · 14 min read

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

Geek Labs

Apr 3, 2026 · Industry Insights

Top GitHub Projects: LLM Memory Compression Tool, AI Code Review Plugin, and WeCom CLI

This article reviews three hot open‑source projects—TurboQuant Plus for compressing LLM memory, a Claude‑Code plugin that leverages Codex for AI‑driven code review, and the Rust‑based WeCom CLI for terminal control of Enterprise WeChat—detailing their features, usage, and target users.

AI Code ReviewClaudeLLM

0 likes · 8 min read

Top GitHub Projects: LLM Memory Compression Tool, AI Code Review Plugin, and WeCom CLI

PaperAgent

Mar 26, 2026 · Artificial Intelligence

TurboQuant: How Google’s New Vector Quantization Cuts KV Memory 6× and Boosts Speed

TurboQuant, presented at ICLR 2026, introduces a theoretically grounded vector quantization technique that reduces large‑language‑model key‑value cache memory by at least six times, achieves up to eight‑fold speedups, and maintains zero accuracy loss by combining PolarQuant’s polar‑coordinate compression with a 1‑bit QJL error‑correction step, as demonstrated on benchmarks such as LongBench and GloVe.

AI inferenceBenchmarkingMemory compression

0 likes · 10 min read

TurboQuant: How Google’s New Vector Quantization Cuts KV Memory 6× and Boosts Speed

Xiaolei Talks DB

Feb 25, 2026 · Databases

Engula: Redis‑Compatible In‑Memory Database Cutting Memory Use by 50%

Engula is a Redis‑compatible, high‑performance in‑memory database that cuts memory usage by up to 50% through compression and metadata optimization, while incurring only about 10% performance overhead, and its architecture, testing methodology, and benchmark results are detailed in this article.

In-Memory DatabaseMemory compressionPerformance Benchmark

0 likes · 7 min read

Engula: Redis‑Compatible In‑Memory Database Cutting Memory Use by 50%

Deepin Linux

Apr 11, 2025 · Fundamentals

Understanding ZRAM: Linux Memory Compression and Swap Optimization

This article explains the ZRAM technology in Linux, covering its principles, configuration steps, kernel integration, performance optimizations, and practical use cases for improving memory utilization on embedded devices, Android, and legacy PCs.

Memory compressionkernelswap

0 likes · 24 min read

Understanding ZRAM: Linux Memory Compression and Swap Optimization

Open Source Linux

May 26, 2023 · Operations

Boost Linux Performance with zSwap, zRAM, and Zstandard Compression

This article explains how Linux memory compression techniques such as zSwap, zRAM, and the Zstandard algorithm reduce I/O pressure, extend flash lifespan, and improve overall system performance, while also covering their drawbacks and step‑by‑step activation procedures.

LinuxMemory compressionPerformance Optimization

0 likes · 6 min read

Boost Linux Performance with zSwap, zRAM, and Zstandard Compression

MaGe Linux Operations

May 18, 2023 · Operations

Boost Linux Performance with zSwap, zRAM, and zstd Compression

Memory compression techniques like Linux's zSwap, zRAM, and the zstd algorithm reduce I/O latency and extend RAM capacity by compressing swap pages, offering performance gains while introducing trade‑offs such as CPU overhead and configuration complexity, and this guide explains their principles, advantages, drawbacks, and activation steps.

LinuxMemory compressionSystem Performance

0 likes · 6 min read

Boost Linux Performance with zSwap, zRAM, and zstd Compression

Coolpad Technology Team

Nov 6, 2021 · Mobile Development

Analysis of Intermittent Unresponsive Touch Events in Feishu Caused by Process D State and Memory Compression

The article investigates why the Feishu app sometimes fails to respond to swipe gestures after a hot start, tracing the issue to the app entering a D (uninterruptible) state during memory compression, and demonstrates how adjusting CPU priority for compression threads can reduce the problem's occurrence.

AndroidInput EventsMemory compression

0 likes · 8 min read

Analysis of Intermittent Unresponsive Touch Events in Feishu Caused by Process D State and Memory Compression

Programmer DD

Jul 19, 2021 · Backend Development

How Redis Ziplist Compresses Memory and When to Use It

This article explains Redis's ziplist compressed list structure, its internal fields, lookup algorithm, performance characteristics, configuration thresholds for Hash and List types, and demonstrates a real‑world use case with memory‑saving calculations and experimental results.

Data StructuresMemory compressionRedis

0 likes · 11 min read

How Redis Ziplist Compresses Memory and When to Use It

OPPO Kernel Craftsman

Feb 21, 2020 · Fundamentals

Overview of Linux Memory Compression Technologies: zSwap, zRAM, and zCache

Linux reduces RAM pressure through three main compression mechanisms—zSwap, which caches compressed pages before writing to swap; zRAM, a RAM‑backed compressed block device; and zCache, a file‑page compressor—each paired with specialized allocators (zsmalloc, zbud, z3fold) and configurable algorithms, offering trade‑offs in speed, ratio, CPU load, and fragmentation.

LinuxMemory compressionPerformance

0 likes · 12 min read

Overview of Linux Memory Compression Technologies: zSwap, zRAM, and zCache