Backend Development 10 min read

Design and Optimization of a High‑Performance Bullet Chat System for Southeast Asian Live Streaming

This article details the design, bandwidth optimization, and reliability strategies of a custom bullet‑chat system for Southeast Asian live streaming, covering background challenges, problem analysis, compression, request throttling, long‑polling versus WebSocket trade‑offs, and a short‑polling solution that successfully supported 700 k concurrent users.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
Design and Optimization of a High‑Performance Bullet Chat System for Southeast Asian Live Streaming

Background

To better support Southeast Asian live streaming, a bullet‑chat feature was added. The first version, powered by Tencent Cloud, suffered from frequent stutters and insufficient messages, prompting the development of an in‑house system capable of handling up to one million concurrent users per room.

Problem Analysis

The system faces three main issues:

Bandwidth pressure – delivering 15 messages every 3 seconds with HTTP headers exceeds 3 KB per packet, resulting in an estimated 8 Gbps data rate, while only 10 Gbps is available.

Weak networks causing stutter and loss – already observed in production.

Performance and reliability – projected QPS can surpass 300 k, demanding robust handling during peak events.

Bandwidth Optimization

Four measures were taken:

Enable HTTP compression (gzip can reduce size by over 40%).

Simplify response structures.

Reorder content to increase redundancy, improving compression ratios.

Frequency control: Bandwidth control: add a request‑interval parameter so the server can throttle client requests. Sparse control: during low‑traffic periods, delay next requests to avoid unnecessary calls.

Bullet‑Chat Stutter and Loss Analysis

Choosing a delivery mechanism (push vs pull) is critical.

Long Polling via AJAX

The client opens an AJAX request that the server holds until an event occurs. Enabling HTTP Keep‑Alive reduces handshake overhead. Advantages: lower latency, good browser compatibility. Disadvantages: the server must maintain many connections.

WebSockets

WebSocket provides true bidirectional communication with minimal header overhead (2‑10 bytes for server‑to‑client frames, plus 4 bytes mask for client‑to‑server). It reduces per‑request overhead compared to HTTP and offers stronger real‑time capabilities.

Long Polling vs WebSockets

Both rely on TCP long connections. TCP keep‑alive probes detect disconnections based on three parameters:

keepalive_probes : number of probes (default 7)

keepalive_time : timeout (default 2 hours)

keepalive_intvl : interval between probes (default 75 s)

In weak Southeast Asian networks, TCP connections often drop, making detection intervals critical. For Long Polling the shortest detection interval is min(keepalive_intvl, polling_interval) , while for WebSockets it is min(keepalive_intvl, client_sending_interval) . Because connections may already be broken when the next packet is sent, TCP keep‑alive offers limited benefit, and WebSockets also struggle under poor network conditions.

Given these constraints, the team adopted a “short‑polling” approach for bullet‑chat delivery.

Reliability and Performance

The service was split into two parts: a sending side handling complex logic and a pulling side serving high‑frequency read requests. This separation prevents one side from overwhelming the other and simplifies scaling.

On the pulling side, a local cache stores recent bullet messages fetched via RPC. Data is sharded by time into a ring buffer that retains only the latest 60 seconds, enabling lock‑free reads and writes and high throughput.

On the sending side, rate limiting discards excess messages, and auxiliary features (avatar fetching, profanity filtering) are designed to fail gracefully, ensuring core message delivery continues.

Summary

During the Double‑12 event, despite a brief Redis outage, the system reliably supported 700 k concurrent users, meeting the performance goals.

backendperformancescalabilityNetworkWebSocketlong pollingbullet-chat
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.