Understanding Distributed Caching and the memcached Architecture

This article explains the fundamentals of caching, describes how memcached works as a distributed in‑memory cache, and details the hashing algorithms—including remainder hashing and consistent hashing with virtual nodes—used by clients to achieve scalable cache distribution.

Architect
Architect
Architect
Understanding Distributed Caching and the memcached Architecture

In high‑concurrency environments, massive read/write requests overwhelm database disk I/O, making caching essential; the article introduces the concept of cache, distinguishes between single‑node and distributed caches, and focuses on the distributed cache service memcached.

It first clarifies the notion of cache within computer architecture by describing the storage hierarchy (registers, L1/L2/L3 caches, main memory, disks) and illustrates how faster storage layers act as caches to accelerate data access.

Extending this to application systems, the article shows the typical flow where a request first checks the cache and, on a miss, falls back to the database, updating the cache after retrieving data.

memcached, originally developed by Brad Fitzpatrick at Danga Interactive, is presented as a high‑performance distributed memory cache used by many large web services to reduce database load and improve response time.

The key characteristics of memcached are listed: simple protocol, libevent‑based event handling, in‑memory storage, and a design where memcached nodes do not communicate with each other.

Since memcached nodes are independent, distribution is achieved on the client side; the client decides which server stores a given key based on a hashing algorithm.

The basic remainder method uses the formula CRC($key)%N, where N is the number of servers, but suffers from heavy re‑hashing when servers are added or removed.

Consistent hashing improves this by mapping both servers and keys onto a 0‑2^32 ring; a key is stored on the first server encountered clockwise, limiting the impact of server changes to a small segment of the ring.

An example shows adding a fifth node (node5) and explains that only the key ranges between node5‑node2 and node5‑node4 are affected, demonstrating the reduced redistribution.

The optimized consistent hashing algorithm introduces virtual nodes, assigning multiple hash positions to each physical server to achieve a more uniform key distribution even with few physical nodes.

In summary, the article provides a comprehensive overview of cache fundamentals and details how memcached implements distributed caching through client‑side hashing, remainder hashing, and consistent hashing techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemscachingconsistent hashingMemcachedhashing algorithms
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.