Understanding Distributed Caching and the memcached Architecture
This article explains the fundamentals of caching, describes how memcached works as a distributed in‑memory cache, and details the hashing algorithms—including remainder hashing and consistent hashing with virtual nodes—used by clients to achieve scalable cache distribution.
In high‑concurrency environments, massive read/write requests overwhelm database disk I/O, making caching essential; the article introduces the concept of cache, distinguishes between single‑node and distributed caches, and focuses on the distributed cache service memcached.
It first clarifies the notion of cache within computer architecture by describing the storage hierarchy (registers, L1/L2/L3 caches, main memory, disks) and illustrates how faster storage layers act as caches to accelerate data access.
Extending this to application systems, the article shows the typical flow where a request first checks the cache and, on a miss, falls back to the database, updating the cache after retrieving data.
memcached, originally developed by Brad Fitzpatrick at Danga Interactive, is presented as a high‑performance distributed memory cache used by many large web services to reduce database load and improve response time.
The key characteristics of memcached are listed: simple protocol, libevent‑based event handling, in‑memory storage, and a design where memcached nodes do not communicate with each other.
Since memcached nodes are independent, distribution is achieved on the client side; the client decides which server stores a given key based on a hashing algorithm.
The basic remainder method uses the formula CRC($key)%N, where N is the number of servers, but suffers from heavy re‑hashing when servers are added or removed.
Consistent hashing improves this by mapping both servers and keys onto a 0‑2^32 ring; a key is stored on the first server encountered clockwise, limiting the impact of server changes to a small segment of the ring.
An example shows adding a fifth node (node5) and explains that only the key ranges between node5‑node2 and node5‑node4 are affected, demonstrating the reduced redistribution.
The optimized consistent hashing algorithm introduces virtual nodes, assigning multiple hash positions to each physical server to achieve a more uniform key distribution even with few physical nodes.
In summary, the article provides a comprehensive overview of cache fundamentals and details how memcached implements distributed caching through client‑side hashing, remainder hashing, and consistent hashing techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
