Scaling Redis QPS from 50k to 500k: Sharding, Replication, and Performance Tweaks
This guide explains how to boost Redis query per second (QPS) from 50,000 to 500,000 by applying horizontal sharding, read‑write separation with master‑slave replication, memory and data‑structure optimizations, and network‑level tuning, while providing concrete examples and configuration tips.
Horizontal Scaling (Sharding)
Divide the key space across multiple Redis nodes to spread load and achieve linear parallel processing. Use consistent hashing or fixed slots (e.g., Redis Cluster’s 16384 slots) for routing; clients or proxies map requests to the appropriate node.
Each instance maintains its own memory and event loop, eliminating single‑point bottlenecks. For example, an e‑commerce session system split a single‑node QPS of 50k into ten shards, reaching a total of 500k QPS.
Client ↓ Hash Slot Routing ↓ Redis Node A (150k) ↓ Redis Node B (150k) ↓ Redis Node C (150k) ↓ Redis Node D (150k)Read‑Write Separation and Master‑Slave Replication
Direct write requests to the master node while distributing read requests across multiple replicas to increase read throughput. Asynchronous replication (master→slave) ensures eventual consistency.
Techniques such as replication backlog and buffering reduce latency. Clients or proxies (e.g., Twemproxy, Codis, Proxy) route based on node role; semi‑synchronous or multi‑master setups require careful consistency trade‑offs. A social platform shifted 80% of reads to four slaves, raising overall QPS from 50k to 40‑50k while monitoring replication delay.
Memory and Data‑Structure Optimization
Improve per‑request efficiency by designing compact keys, choosing appropriate data structures, and reducing memory and CPU overhead.
Key tactics include using compressed encodings (ziplist, intset), setting suitable maxmemory-policy, and employing short keys with compact values to lower memory bandwidth.
Avoid oversized keys (e.g., huge List/Set) that cause O(N) operations; use batch commands (MGET, pipelining) to cut network round‑trips; enable high‑performance allocators like jemalloc or tcmalloc to reduce fragmentation. An advertising system re‑encoded user profiles into compact binary forms, combined with batch MGET and pipelining, achieving a 2‑3× QPS increase per instance and 500k QPS cluster‑wide.
Network and Deployment Optimization
Reduce network latency and system‑call overhead to increase per‑connection throughput.
Practices include using persistent connections, enabling TCP_NODELAY, tuning kernel parameters (somaxconn, net.core.somaxconn, tcp_tw_reuse), deploying on high‑speed networks (10GbE/40GbE), applying CPU affinity and NUMA optimizations, and for Redis 6+ enabling I/O threads for parallel network handling while keeping command execution on the main thread.
In ultra‑high QPS scenarios, consider RDMA or other memory‑direct hardware acceleration.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
