Databases 15 min read

Redis Optimization for Vivo Push Platform: Architecture, Bottlenecks, and Solutions

To sustain Vivo Push Platform’s massive real‑time traffic, engineers re‑architected two Redis clusters, trimmed capacity by 58 %, split clusters, randomized hotspot‑prone keys, and introduced three‑level caching, cutting peak CPU load by 15 %, halving response time and improving overall Redis efficiency during peak loads.

vivo Internet Technology

Feb 9, 2022

Redis Optimization for Vivo Push Platform: Architecture, Bottlenecks, and Solutions

Vivo Push Platform provides a real‑time message push service for developers, supporting hundreds of millions of notifications with a peak throughput of 1.4 million pushes per second and a daily volume of up to 15 billion messages. The platform relies on long‑lived connections and demands high concurrency and low latency.

The platform uses two Redis clusters: a msg cluster for storing message bodies and expiration, and a client cluster for client state information. Both clusters originally ran Redis Cluster mode with a large number of master nodes (220 masters, 4.4 TB total capacity).

During a high‑traffic event (5.2 billion messages in 30 minutes), the msg cluster suffered severe hot‑spot issues: a single node reached 24 674 connections and 23.46 GB memory, with response times around 500 ms and availability dropping to 85 %.

Optimization was carried out in four main areas:

Capacity Optimization : Analyzed Redis snapshots with the open‑source RDR tool, identified that ~80 % of keys start with mi: and are single‑push messages. By promptly deleting delivered single‑push messages and aggregating identical content, the msg cluster capacity was reduced from 3.65 TB to 2.09 TB (58 % reduction).

Cluster Splitting : Separated message bodies and waiting queues into two independent clusters and upgraded to Redis 4.x. Two migration strategies were evaluated; the chosen “dual‑read, single‑write” approach kept data intact while gradually shifting reads to the new cluster.

Hot‑Key Mitigation : Discovered that the Snowflake‑generated messageId caused a hotspot on the mii:0 key because the lower 12 bits were often zero. The sequence start value was randomized (0‑1023) and the HEXISTS check was replaced, eliminating the hotspot.

Client‑Redis Concurrency Reduction : Introduced three‑level caching (static info, frequently changing info, encryption parameters) and added cache‑validation via broker responses and connection events. This reduced client‑Redis calls by ~20 % and increased cache hit rates (cache1 ≈ 52 %, cache2 ≈ 30 %).

Post‑optimization results include:

Msg‑Redis peak CPU load reduced from >95 % to ~70 % (15 % drop).

Average response time fell from 1.2 ms to 0.5 ms.

Overall Redis load decreased by ~15 % during peak traffic.

The experience highlights key best practices for Redis in high‑concurrency systems: ensure key randomness, avoid large keys, monitor slot distribution, and perform regular capacity and hotspot analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization push platform Redis cluster scaling Hot Key Mitigation

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.