Backend Development 11 min read

Design and Evolution of Live Streaming Bullet Comment System: From HTTP Polling to Long Connection

Meipai’s live‑stream bullet‑comment platform progressed from an initial HTTP‑polling design supporting one million users, through a high‑availability dual‑room architecture, to a scalable long‑connection system with gRPC routing, dynamic degradation, and caching, solving message ordering, Redis bottlenecks, and ensuring seamless user experience.

Meitu Technology
Meitu Technology
Meitu Technology
Design and Evolution of Live Streaming Bullet Comment System: From HTTP Polling to Long Connection

This article introduces the design and evolution of Meipai's live streaming bullet comment (danmu) system, which has evolved through three phases to support millions of concurrent users.

Phase 1: Quick Launch

The initial requirement was rapid deployment with support for one million concurrent users. The team adopted an HTTP polling approach initially, planning to migrate to long connections later. The system treats gifts, comments, and user data as messages, using Redis sortedset for storage. The message model uses Zadd for writing (with score as relative time), ZrangeByScore for polling messages every two seconds, and Zrange for retrieving user lists. The write flow is: frontend → Kafka → processor → Redis, while read flow is: frontend → Redis.

A critical concurrency issue was message loss due to out-of-order writes. The solution involved writing messages to the same Kafka partition for the same live room and message type, and using synchronized blocks to ensure serial Redis writes. The principle is: messages must be written in ascending order by message number.

Three major problems emerged after launch: (1) Message accumulation in Kafka due to serial Redis writes - solved by dynamically adjusting Kafka partitions and lock strategies based on delay levels; (2) Redis slave performance bottleneck from frequent ZrangByScore operations - solved by implementing local caching on frontend servers with automatic adjustment of returned message counts based on room size; (3) Replay data competing with live data for Redis CPU resources - solved by backing up to MySQL after live streams end and using separate replay Redis instances.

Phase 2: High Availability

The system implements dual-machine room deployment with primary and secondary rooms. Writes go to the primary room while reads are distributed across both secondary rooms. Comprehensive degradation strategies and full-link business monitoring were implemented. The system successfully handled TFBOYS' four live streams with peak concurrent users approaching one million, 28.6 million views, 29.8 million comments, and 2.623 billion likes.

Phase 3: Long Connection Migration

The long connection architecture includes: routing service for percentage-based灰度 and blacklist/whitelist control by uid, deviceId, and version; client support for both long and short connections with automatic degradation after three failed connection attempts; connection layer maintaining persistent connections without business logic; push layer managing user-to-room subscriptions; and inter-service communication using the tardis framework based on gRPC with etcd for service discovery.

The message model uses a subscription-push pattern. For small rooms, the push layer notifies the connection layer which then pulls and delivers messages. For large rooms (many subscribers), the system automatically degrades to a broadcast model. The migration was smooth due to灰度 and blacklist/whitelist support, with users experiencing no perceptible changes.

Real-time MessagingSystem ArchitectureLive StreamingHigh AvailabilityRedisKafkaLong Connectionbullet comment system
Meitu Technology
Written by

Meitu Technology

Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.