How Bilibili Scaled Live Chat with GOIM: Architecture and Performance Optimizations

This article explains how Bilibili built the high‑stability, high‑availability, low‑latency GOIM live‑chat system, detailing its component modules, memory and module optimizations, network redesign, testing results, and ongoing monitoring to handle millions of concurrent users.

dbaplus Community
dbaplus Community
dbaplus Community
How Bilibili Scaled Live Chat with GOIM: Architecture and Performance Optimizations

Background

Live streaming chat (弹幕) requires three guarantees: high stability (steady connections), high availability (automatic fail‑over when a node crashes), and low latency (message delay < 1 s). The GOIM (B‑Station Live Chat) service was built to meet these requirements.

GOIM Architecture

The system consists of the following components (see

GOIM architecture diagram
GOIM architecture diagram

):

Client : establishes a long‑polling (Comet) connection to the server.

Comet : maintains the long‑lived TCP/WebSocket connection, handles low‑level protocol details and keeps the link alive.

Logic : performs per‑message processing such as authentication, IP filtering, and blacklist checks.

Router : stores session information and maps each user to a specific machine for routing.

Kafka (third‑party): distributed publish/subscribe queue; each message is tagged with a topic for scalable distribution.

Jop : runs on multiple machines, pulls messages from Logic and forwards them to all Comet instances.

GOIM evolved from an earlier project called Gopush, adding optimizations specific to massive live‑chat workloads.

Optimization Paths

Memory Optimizations

Single memory block per message : messages are aggregated in a Job object; Comet holds only a pointer to the aggregated block, eliminating duplicate allocations.

Per‑user memory on the stack : each user’s temporary data is allocated inside its dedicated Goroutine stack, avoiding heap fragmentation.

Self‑managed memory pools : critical paths in the Comet module replace ad‑hoc new / malloc calls with pre‑allocated pools, reducing GC pressure.

Module Optimizations

Parallel, non‑interfering message distribution : each Comet channel operates independently, preventing contention between streams.

Controlled concurrency : a fixed pool of worker Goroutines is created ahead of time; asynchronous tasks are dispatched to this pool to avoid sudden spikes.

Sharded global locks : locks for socket pools and online‑user tables are partitioned by CPU core count, reducing lock contention.

Network Optimizations

Initially all services ran in a single IDC, causing bandwidth bottlenecks and single‑point failures. The architecture was redesigned to a multi‑IDC topology (see

Multi‑IDC deployment diagram
Multi‑IDC deployment diagram

):

Deploy entry points in several IDC locations; the Svrlist module routes users to the nearest stable node.

Continuously monitor drop‑rate per IDC and dynamically adjust routing based on real‑time statistics.

Automatically disable failed servers to maintain 100 % message delivery.

Apply traffic shaping to respect ISP bandwidth caps.

Cross‑IDC traffic traverses public networks, so redundant telecom lines and backup paths between IDC‑1 and IDC‑2 were added to improve stability and reduce packet loss.

Testing and Results

2015 stress test (see

2015 test data
2015 test data

): two physical machines each handled ~250 k concurrent users, pushing 20‑50 messages / s per stream. Peak throughput reached 50 msg/s per stream, 24.4 M messages/s overall. CPU was saturated, memory ~4 GB, network traffic ~3 GB, indicating CPU as the bottleneck.

2016 optimization (see

2016 test data
2016 test data

): all heap allocations were moved to stack‑based pools and the system was consolidated onto a single machine supporting 1 M concurrent users. Throughput increased dramatically, but the ultimate limiting factor became network traffic volume.

Monitoring and Fault Detection

Simulated clients generate traffic to measure message arrival rates.

Real‑time CPU profiling (Ppof) captures snapshots for performance analysis.

Whitelist specific users to collect server‑side logs for issue tracking.

Server load monitoring with SMS alerts for rapid response.

These mechanisms provide low‑cost, high‑efficiency observability and enable continuous improvement of the service.

Conclusion

The GOIM system demonstrates a comprehensive approach to building a highly stable, highly available, and low‑latency live‑chat platform. By refining memory management, controlling module concurrency, sharding locks, and redesigning the network topology across multiple IDC sites, Bilibili achieved orders‑of‑magnitude performance gains while maintaining 100 % message delivery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendSystem Architecturelive streamingchat systemhigh concurrency
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.