Design and Implementation of Bilibili's Low‑Latency Cloud Gaming Platform Using WebRTC
Bilibili built a cross‑platform cloud‑gaming service that leverages WebRTC with tuned jitter buffers, unordered data channels, adaptive input‑report rates, and a custom kernel driver to deliver sub‑100 ms latency, dynamic bitrate control, and haptic feedback, overcoming typical latency, stutter, and flexibility limitations of existing solutions.
With the widespread adoption of high‑speed networks and 5G, cloud gaming products have become increasingly common. Cloud gaming lowers the entry barrier for players by eliminating the need for powerful hardware and large game downloads.
The industry faces three major challenges: high latency and streaming stutter, a market dominated by a few providers, and limited flexibility and high cost for third‑party solutions. To address these bottlenecks, Bilibili developed its own cloud gaming platform.
Bilibili's cloud gaming is designed for cross‑platform use, supports tactile feedback, local and remote multiplayer, and even allows private deployment on a user’s PC.
WebRTC
For web‑based access, Bilibili chose WebRTC as the underlying protocol. Although WebTransport is emerging, WebRTC enjoys broader browser support. WebRTC uses ICE for NAT traversal, SRTP for secure audio/video transport, and SCTP for data channels.
Jitter Buffer
The SRTP module includes a jitter buffer that reorders received RTP packets, then reorders audio/video frames and GOPs before decoding. While a longer jitter buffer smooths playback, it adds 500 ms of display latency in typical wireless + public‑network conditions, which is unacceptable for cloud gaming.
Reducing Jitter Buffer Delay
In Safari, audio and video tracks are attached separately to a <video> element, resulting in independent jitter buffers and lower latency but possible A/V desynchronization. Chrome synchronizes audio and video, causing larger jitter buffers. By adjusting three private Chrome WebRTC settings, Bilibili reduced local playback latency to ~50 ms (up to 100 ms on 2.4 GHz Wi‑Fi).
Using the x264 encoder with B‑frames disabled and zero‑latency tuning further lowers latency; libvpx or openh264 require larger buffers.
Handling Stutter After Reducing Jitter Buffer
Shorter buffers increase the risk of stutter during network jitter. Bilibili employs RTCP Transport Feedback (GCC – Google Congestion Control) to dynamically adjust bitrate based on network conditions, preventing stalls while preserving frame rate.
Ultra‑Low‑Latency Control Protocol
Control signals use WebRTC DataChannel. By configuring the channel for unordered, unreliable transmission (similar to UDP), only the latest control data is processed, avoiding latency caused by retransmissions.
Adaptive Hardware Feedback Rate
Instead of a fixed 120 Hz input report rate, Bilibili implements a dynamic rate ranging from 10 Hz to 120 Hz. High‑frequency digital inputs trigger immediate 120 Hz reports, while analog inputs are classified into high, medium, and low frequency zones with corresponding adaptive rates (e.g., 10 Hz for low‑frequency joystick noise).
Dynamic reporting reduces upstream congestion and improves responsiveness, achieving an average response latency of 0.667 ms compared to 4.834 ms with a fixed 120 Hz rate.
Data Packet Assembly
Each packet contains a sequence number, session ID, packet type, and payload. The sequence number ensures ordering (as DataChannel is treated like raw UDP). Session IDs differentiate multiple controllers per client. Payloads follow little‑endian binary structures, such as the XINPUT_GAMEPAD struct for controller data.
Game Control
Bilibili’s solution injects a kernel‑mode driver that emulates an Xbox controller, allowing compatibility with games using XInput, RawInput, DirectInput, or Windows.Gaming.Input. This driver‑level approach avoids costly API hooking and works across Unity, Unreal, and anti‑cheat protected titles.
Haptic Feedback
While Web HapticActuator works on PCs, mobile platforms lack fine‑grained vibration control. Bilibili created the cross‑platform BiliHaptic API, which on Android and Web uses timed vibration patterns to emulate intensity levels, and on iOS leverages Core Haptics.
Conclusion
Bilibili’s self‑developed cloud gaming platform combines modular SDKs, cross‑platform compatibility, and ultra‑low latency techniques. When deployed on user‑owned servers, an SFU forwards audio/video streams for collaborative play; server‑side deployment offers higher stability and even lower latency.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.