Overview of Alibaba Cloud Redis Architecture and Optimizations
This article summarizes Alibaba Cloud Redis’s architecture, including single-node, high‑availability, cluster, read‑write separation, disaster‑recovery options, and hybrid storage, and details its kernel optimizations such as AOF/Binlog enhancements, multi‑threaded I/O, Memcache compatibility, and performance improvements.
This article compiles a sharing from Alibaba student Xia Zhou , detailing the development and current status of Alibaba Cloud Redis.
Redis Introduction
Redis: Remote Dictionary Server, a key‑value storage system. Advantages include:
Ease of use: rich data‑structure support and modules.
High performance: efficient data‑structure design, fully in‑memory operations.
Reliability: master‑slave synchronization and persistence.
Alibaba Cloud Redis Architecture
Overall Architecture
Single‑node → single‑machine master‑slave → cluster → read‑write separation → intra‑city disaster recovery → inter‑city multi‑active.
The Alibaba Cloud overall architecture includes six supporting systems:
HA control system – monitors Redis instance health.
Log collection system – gathers slow‑query, access, and other logs.
Monitoring system – collects performance metrics such as basic info, keys, strings, etc.
Online migration system – rebuilds instances from backup files when the underlying physical machine fails.
Backup system – backs up Redis instances to OSS, supporting user‑defined schedules and a 7‑day retention.
Task control system – handles creation, configuration changes, backups, and tracks task execution and errors.
Dual‑Replica Architecture
Applicable scenarios: pure cache, data persistence.
Performance: 80‑100k QPS.
Connection path: internal SLB → Redis.
Access method: compatible with all open‑source clients.
SLA: 2 replicas, high‑availability, sub‑second failover.
Cluster Dual‑Replica Architecture
Applicable scenarios: large data volume, high performance requirements.
Performance: 1M QPS.
Connection path: internal SLB → Redis.
Access method: compatible with all open‑source clients.
SLA: sharded 2‑replica high‑availability.
Read‑Write Separation Architecture
Applicable scenarios: read‑heavy, write‑light, large keys, no strong consistency requirement.
Advantages: linear scaling of read/write capacity, supports all commands, transparent to users.
Intra‑City Disaster Recovery
High data availability; hot‑standby across two data centers ensures seamless failover; after recovery, incremental sync via Binlog prevents traffic spikes.
Inter‑City Multi‑Active Disaster Recovery
Applicable scenarios: multi‑region read/write, proximity‑based access.
Advantages: tolerates N‑1 data‑center failures, eventual consistency, cross‑region disaster recovery.
Hybrid Storage
Applicable scenarios: massive data, moderate performance, video streaming, e‑commerce.
Advantages: 100% Redis compatibility, hot‑cold data separation, high cost‑effectiveness, asynchronous write‑back of cold data to RocksDB, single instance supports TB‑scale storage.
Alibaba Cloud Redis Kernel Optimizations
Compatibility with Memcache protocol.
High‑availability probing.
AOF Binlog support.
Persistence system enhancements.
Security encryption.
I/O and connection optimizations.
Persistence System Refactor
Retains historical AOF logs, removes AOF rewrite, expands AOF log information, and adopts a new data organization: full RDB + historical AOF.
Asynchronous AOF Write Improvements
Added biowrite mode.
Lightweight lock queue.
Dedicated BIO thread for write operations.
Reduces impact of slow writes on the main thread, boosting performance.
Memcache Protocol Support
Supports both text and binary protocols, fully compatible.
Reuses Redis String objects to store memcached data.
Leverages native Redis synchronization and persistence mechanisms.
New version adds scanning, backup, and richer statistics.
Hybrid Storage Optimizations
All keys and hot data stay in memory.
Cold data stored in RocksDB.
Hot‑data performance matches native Redis.
100% Redis compatibility.
Cold‑data write‑back to RocksDB is asynchronous, not blocking the main thread.
Single instance can handle TB‑scale data.
Network I/O multi‑threading optimizations further improve throughput.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
