Memcached Slab Allocator Explained: Memory Management & Scaling
This article explains Memcached's slab allocator memory management, key concepts like items, chunks, slab classes and pages, the calcium problem, and how master‑slave double‑layer and L1 cache architectures enable high concurrency, high availability, and linear scaling.
1. Introduction to Memcached Memory Allocation Principles
While installing and using Memcached commands is enough for most development tasks, diagnosing online issues requires understanding its memory allocation management.
By default, Memcached uses a mechanism called the Slab Allocator, which divides memory into fixed‑size blocks to eliminate fragmentation.
Before describing the allocation principle, the following key terms are defined:
Item
An element to be stored, measured in bytes; think of it as an object.
Chunk
The memory space used to cache an item; analogous to a storage compartment.
Slab Class
A group of chunks of a specific size, e.g., 80 B, 96 B, etc.
Page
The memory region assigned to a slab class, default 1 MB. After a page is assigned, it is split into chunks according to the slab class size; a page can be divided by only one slab class.
After these concepts, we can see how Memcached allocates memory.
Figure 1: Memcached initialization diagram. When Memcached starts, parameters such as the initial slab class size (80 B) and growth factor (1.5) determine the generated slab class table.
Figure 2: Slab distribution diagram.
When a request for a 123 B item arrives, Memcached selects the smallest slab class larger than the item (180 B in this example) and allocates a page (default 1 MB) to that slab.
Figure 3: Page allocation diagram.
The 1 MB page is divided into 1 MB / 180 B ≈ 5828 chunks, allowing the 123 B request to be stored.
As time passes, memory becomes fully allocated, as shown in Figure 4.
Figure 4: Memory slab allocation diagram. Some slab classes may receive no pages at all.
If all slabs and their chunks are exhausted and a new 123 B item arrives, Memcached triggers its eviction mechanism.
Memcached first checks whether the 180 B slab contains expired items; if not, it evicts using LRU within that slab only. Eviction does not affect other slabs, and pages allocated to a slab are never reclaimed until Memcached restarts.
This behavior is known as the calcium problem : after running for a while, memory is allocated according to the original access pattern, and a change in pattern can cause frequent evictions even when free memory exists, lowering hit rates. Restarting the cache resolves it.
High Concurrency & High Availability
2. Master‑Slave Double‑Layer Structure
By sharding data, a group of Memcached instances replaces a single instance, addressing single‑port capacity and traffic limits. However, if a cache node fails, requests fall back to the backend DB. Consistent hashing can mitigate this loss.
In a Weibo‑like workload, consistent hashing still has drawbacks:
High hit‑rate requirements (≥99 %) mean that a single node failure sharply reduces overall cache hit rate.
Request drift occurs when a node becomes temporarily unreachable; updates are written to another node and become invisible to the original node after recovery.
We address these single‑point issues by introducing a master‑slave cache structure (Figure 5).
Figure 5: Master‑slave double‑layer cache.
Writes are performed on both master and slave (dual‑write). Reads first query the master; if the master returns empty, the slave is consulted.
2. Horizontal Linear Scaling
The double‑layer structure solves the single‑point problem, but bandwidth saturation and request volume still limit scalability.
To achieve linear scaling, we increase the number of data replicas, distributing load across multiple nodes.
In practice, we add an L1 cache layer above the master (Figure 6).
Figure 6: L1 cache architecture.
Write operations follow the order master → slave → all L1 caches; failures trigger delete operations and subsequent cache‑miss fallback.
Reads select an L1 group, then hash to a specific node; if the L1 group misses, the request falls back to master and then slave, with successful reads being written back to L1.
When traffic reaches a threshold, additional L1 groups are added, achieving linear capacity growth.
Even with the double‑layer structure, a cold slave (or master) can become a bottleneck. To mitigate this, we configure an entire slave group as an L1 resource, allowing slaves to handle hot requests, and occasionally let the master act as an L1 group (Figure 7).
Figure 7: Slave and master simultaneously serving as L1.
Source: http://blog.csdn.net/wongson/article/details/47418039
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
