Operations 13 min read

How Alibaba Ensured Fair, High‑Performance Live Streaming for Double 11 Cat Night

This article details how Alibaba’s entertainment team designed and implemented a robust, cloud‑native architecture for the 2019 Double 11 “Cat Night” live‑stream, addressing technical goals, fairness, high‑concurrency spikes, dynamic routing, peak‑shaving, downstream protection, and multi‑screen interaction to deliver a seamless consumer experience.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Ensured Fair, High‑Performance Live Streaming for Double 11 Cat Night

Background and Scale

2019 “Cat Night”, the Double 11 entertainment show, attracted viewers from over 200 countries and regions, with 51.44 million users watching the charity live stream and more than 100 million likes. The event required a one‑day system to deliver both visual spectacle and commercial value.

Defining Technical Goals

The product team set three primary objectives: provide a visual feast for consumers, deliver tangible benefits (e.g., product sales and charity rewards), and ensure merchants can successfully promote goods. These business goals were broken down into technical targets such as stability, experience consistency, full reward distribution, no financial loss, team growth, and system knowledge retention.

Goal‑Setting Process

The technical team first examined industry trends, market size, business requirements, and team capabilities. Then they decomposed the goals, identified key metrics and responsible teams, and finally quantified challenging indicators with milestone deadlines.

Ensuring Fair and Consistent Experience

Because the show ran on multiple apps (Taobao, Youku, Tmall), all interactive features had to appear simultaneously on each platform. Fairness was threatened by network latency, signal delay, and production delays. Four mechanisms were introduced:

Synchronize client and server clocks via CSN and wireless RPC gateway polling.

Measure and compensate for on‑site and production latency.

Align operation events through repeated rehearsals with hosts and directors.

Insert SEI timestamps into the live stream so the client can start interactions at the correct moment.

Handling High‑Concurrency Pulse Traffic

Baseline traffic persisted throughout the event, while each interactive round generated pulse traffic. The team applied three core tactics: full‑chain pressure testing, application warm‑up, and anti‑brush rate limiting. Additional optimizations included peak‑shaving, dynamic routing, and downstream protection.

Dynamic Routing

All requests normally passed through the wireless RPC gateway, but the system could dynamically shift a portion of traffic to CSN polling when gateway load approached limits, ensuring stability.

Peak‑Shaving and Off‑Peak Strategies

Peak‑shaving involved randomizing client submissions within a time window, limiting per‑user request rates for red‑packet rain, and pre‑fetching final‑prize queries. Off‑peak tactics separated public‑domain and private‑domain interactions in time and used middleware to reduce private‑domain pressure.

Downstream Protection

Since the reward platform was a downstream dependency, the team avoided double calls by separating high‑value and low‑value prize pools, guiding users with more treasure boxes toward the high‑value pool, and monitoring treasure‑box distribution in real time to trigger pre‑emptive measures.

On‑Site Large‑Screen and Small‑Screen Interaction

Multiple rehearsals revealed issues such as signal interference and touch‑screen failures. A four‑level contingency plan was established, ranging from pre‑event touch‑screen checks to hot‑standby machines and manual keyboard control.

Key Takeaways

Pre‑plans must be executable, rehearsed, and have backup‑of‑backup options.

Technical PM skills are essential: anticipate risks, define clear metrics, and ensure delivery.

Craftsmanship matters—optimize performance and experience without over‑engineering.

Document tools, components, and processes for future teams.

Continuous post‑mortem and knowledge sharing broaden perspective beyond one’s specialty.

Conclusion

Even a system used only once a year can achieve reliable, fair, and high‑performance operation through disciplined goal setting, rigorous testing, dynamic routing, and comprehensive contingency planning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System Architecturelive streaminghigh concurrencyFairness
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.