Databases 15 min read

Designing a High‑Availability, Auto‑Scaling KV Storage System Based on Memcached and Redis

This article examines common NoSQL key‑value stores such as Memcached and Redis, compares their strengths and limitations, and proposes a distributed architecture with routing, storage, management, and migration nodes that achieves high availability, automatic fault‑tolerance, load balancing, and elastic scaling.

Art of Distributed System Architecture Design

Apr 1, 2015

Designing a High‑Availability, Auto‑Scaling KV Storage System Based on Memcached and Redis

Unlike the early Internet era, modern social and mobile applications generate massive read/write requests and explosive data growth, making traditional relational databases a performance and scalability bottleneck. NoSQL databases, especially key‑value (KV) stores like Memcached and Redis, offer simple data models, high performance, and easy scalability.

KV stores are widely used both as caches and as primary data stores. The classic cache‑plus‑DB pattern stores frequently accessed data in a cache (e.g., Memcached) to reduce database load, while writes still go directly to the DB and cache entries are invalidated.

Memcached stores data only in memory, making it suitable for cache‑layer use cases such as static pages, configuration data, hot user profiles, and near‑real‑time statistics. However, it cannot improve write performance, suffers from cache‑misses on dispersed hot data, and loses all data on node failure, requiring a warm‑up period that can stress the database.

Redis improves on Memcached by providing persistence (AOF and RDB), master‑slave replication with read/write separation, and a rich set of data structures (string, hash, list, set, sorted set, etc.). These features enable Redis to serve both as a cache and as a primary store for persistent data, offering higher read/write performance, atomic operations, and use‑case flexibility such as counters, leaderboards, de‑duplication, and precise expiration.

Despite its advantages, Redis lacks built‑in automatic fault‑tolerance; a master or slave failure can cause temporary request failures and may lead to data inconsistency. Full‑copy replication consumes significant memory and can impact cluster performance, and online scaling is complex, requiring careful capacity planning.

The article proposes a high‑availability, auto‑elastic KV storage system that combines the protocols of Memcached and Redis, uses memory as the primary medium, and provides ultra‑high read/write throughput, automatic failover, load balancing, and elastic scaling.

System Architecture

The system is distributed across four node types: a stateless routing node that hashes keys to storage nodes; storage nodes that handle read/write and report heartbeats; a management node that maintains routing tables, decides on failover, load balancing, and scaling; and a migration node that performs data copy/move during rebalancing.

All nodes except the routing nodes run in a primary‑backup (master‑slave) mode, and multiple routing nodes can sit behind an LVS load balancer for fault tolerance.

Key Technical Points

Data Distribution

Data is distributed using consistent hashing with virtual nodes to achieve balanced key placement and monotonicity, ensuring minimal data movement when nodes are added or removed.

Automatic Fault‑Tolerance and Recovery

Storage nodes periodically send heartbeats to the management node. If a master stops reporting, the backup is promoted, the routing table is updated, and traffic is redirected. The system supports both full‑copy and incremental recovery modes, automatically choosing based on the downtime duration.

Load Balancing and Elastic Scaling

Storage nodes report I/O and capacity metrics. The management node detects load or capacity imbalance and instructs the migration node to move virtual nodes (and their keys) from overloaded to under‑utilized nodes. When cluster load exceeds thresholds, new nodes are added; when load drops, nodes are removed, all while preserving consistent‑hash monotonicity.

Images illustrating the NoSQL landscape, system architecture, consistent‑hash process, fault‑tolerance flow, and node scaling are included in the original article.

Conclusion

Providing high availability and automatic elastic scaling for a KV storage system addresses critical operational challenges such as node failures, disk crashes, and sudden traffic spikes, while reducing operational overhead and risk, thereby ensuring a durable and stable service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability Redis Elastic Scaling KV store Memcached

Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.