Redis Partitioning: How to Store Data Across Multiple Redis Instances
This article explains the concept, benefits, methods, and practical considerations of partitioning data across multiple Redis instances, covering range and hash partitioning, consistent hashing, implementation options, drawbacks, and recommended tools such as Redis Cluster and Twemproxy.
Partitioning is the process of splitting your data across multiple Redis instances so that each instance holds only a subset of keys. This article first introduces the concept of partitioning and then discusses how to use it with Redis.
Why partitioning is useful
Partitioning in Redis serves two main purposes: it allows you to build a larger database by aggregating the memory of multiple machines, and it enables elastic scaling of compute power and network bandwidth across multiple hosts.
Partitioning basics
There are several partitioning schemes. Suppose we have four Redis instances R0, R1, R2, R3 and keys such as user:1, user:2, etc. Different schemes map a given key to a specific Redis server.
Range partitioning
The simplest method is range partitioning, where a range of object IDs is mapped to a particular instance (e.g., IDs 0‑10000 to R0, 10001‑20000 to R2). This works in practice but requires maintaining a mapping table, which can be cumbersome.
Hash partitioning : Use a hash function (e.g., CRC32) to convert the key name into a number. For example, the key foobar yields crc32(foobar) ≈ 93024922.
Apply a modulo operation to map the number to one of the four instances. 93024922 % 4 = 2, so foobar is stored on R2. Most languages use the % operator for this.
More advanced hash partitioning includes consistent hashing, which many Redis clients and proxies implement.
Partition implementation approaches
Client-side partitioning : The client directly selects the correct node for each key. Many Redis client libraries support this.
Proxy-assisted partitioning : The client sends requests to a proxy (e.g., Twemproxy) which forwards them to the appropriate Redis instance based on a configured strategy.
Query routing : A request may be sent to any instance, which then redirects the query to the correct node. Redis Cluster uses a hybrid query‑routing approach.
Redis partitioning drawbacks
Operations that involve multiple keys across different instances are not directly supported.
Multi‑key transactions are unavailable.
Large single keys (e.g., huge sorted sets) cannot be sharded because partitioning is key‑based.
Management becomes more complex: you must handle multiple RDB/AOF files and persist data across several hosts.
Adding or removing nodes is intricate; while Redis Cluster supports transparent rebalancing, other schemes may require manual pre‑sharding.
Data store vs. cache usage
When Redis is used as a persistent data store, each key must consistently map to the same instance, limiting flexibility. As a cache, you can change the key‑to‑node mapping to improve availability and scalability, and consistent hashing can automatically redirect keys when a node fails.
Pre‑sharding
Because adding or removing nodes is complex for a persistent store, a common strategy is to start with a large number of small instances (e.g., 32 or 64) to provide headroom for future growth.
When scaling, you add more Redis servers and migrate half of the instances to the new machine, repeating the process as needed. Use Redis replication to minimize downtime: start a new empty instance, copy data to it, update configuration, issue SLAVEOF NO ONE , restart clients, and finally shut down the old instance.
Practical partitioning
For production use, the recommended solution is Redis Cluster, which offers automatic sharding and high availability since version 3.0 (released April 1 2015). Redis Cluster combines client‑side partitioning with query routing.
Twemproxy framework
Twemproxy, developed by Twitter, is a fast, single‑threaded proxy for Memcached and Redis protocols written in C. It can automatically partition keys across multiple Redis nodes and hide unavailable nodes from clients, reducing single‑point‑of‑failure risk.
Client‑side consistent hashing
Alternatively, you can use client libraries that implement consistent hashing (e.g., redis‑rb, Predis). Check the full list of Redis clients to find a mature implementation for your programming language.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.