Backend Development 14 min read

AutoMQ Memory Cache Design and Mitigating Netty PooledByteBufAllocator Memory Fragmentation

The article explains AutoMQ's memory‑cache architecture, compares LogCache and BlockCache designs, analyzes Netty's internal and external memory fragmentation caused by the Buddy and PageRun/PoolSubpage allocators, and presents mitigation techniques such as ByteBufSeqAlloc to reduce OOM risks.

High Availability Architecture
High Availability Architecture
High Availability Architecture
AutoMQ Memory Cache Design and Mitigating Netty PooledByteBufAllocator Memory Fragmentation

Kafka relies on mmap‑based file caching for low‑latency streaming, while AutoMQ, which stores data in object storage, cannot use mmap and therefore adopts a pure‑memory caching approach to improve efficiency and reduce operational complexity.

AutoMQ implements two cache mechanisms tailored to different workloads: LogCache for tail‑read (hot) data using a FIFO eviction policy and exclusive memory space, and BlockCache for cold‑read (high‑throughput) scenarios employing LRU eviction and large‑block pre‑fetching (≈4 MiB) from object storage.

Both caches use off‑heap DirectMemory backed by Netty's PooledByteBufAllocator to lessen JVM GC pressure.

During stress testing of AutoMQ 1.0.0 RC, OOM errors appeared despite allocated memory being well below the 6 GiB limit. Investigation revealed a large discrepancy between the memory requested by AutoMQ and the actual memory allocated by Netty, caused by severe memory fragmentation.

The fragmentation originates from Netty's allocator:

Older Netty (< 4.1.52) uses a Buddy allocation algorithm, leading to high internal and external fragmentation, especially for mixed‑size allocations.

Newer Netty (≥ 4.1.52) adopts a PageRun/PoolSubpage strategy, reducing fragmentation but still vulnerable under continuous allocate‑release cycles.

To eliminate fragmentation, AutoMQ introduces ByteBufSeqAlloc which requests memory in fixed‑size chunks and slices them with ByteBuf#retainSlice , ensuring zero internal and external fragmentation for LogCache. For BlockCache, AutoMQ caches large raw data blocks and decodes them on demand, optionally splitting them into regular 1 MiB pieces to avoid fragmentation.

Buffer usage:
ByteBufAllocMetric{allocatorMetric=PooledByteBufAllocatorMetric(usedDirectMemory: 2294284288; ...), allocatedMemory=1870424720, 1/write_record=1841299456, 11/block_cache=0, ..., pooled=true, direct=true} (com.automq.stream.s3.ByteBufAlloc)

With these strategies, AutoMQ keeps off‑heap fragmentation below 35 % and prevents further OOM incidents, while recommending that users upgrade to Netty ≥ 4.1.52 and consider large‑block allocations when using Netty as a caching layer.

BackendJavaNettyDirectMemoryMemoryFragmentationCacheDesign
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.