How Baidu’s Unified Long‑Connection Service Scales Millions of Real‑Time Connections
This article details Baidu’s internally built unified long‑connection service in Go, covering its motivation, architecture, functional implementation, performance optimizations, multi‑business support, deployment strategy, and lessons learned for delivering secure, high‑concurrency, low‑latency real‑time connectivity across mobile applications.
Introduction
In the mobile‑Internet era, real‑time and interactive services require long‑connection capabilities. This article introduces Baidu’s internally built unified long‑connection service implemented in Go, describing its design, functional implementation and performance optimizations.
Abstract
Long‑connection services keep a persistent bi‑directional channel between client and server, enabling server‑initiated push. Maintaining low latency, high concurrency and high stability is challenging, especially when each business maintains its own service.
Unified Long‑Connection Service Goals
Provide a secure, high‑concurrency, low‑latency, easy‑to‑integrate, low‑cost long‑connection capability for Baidu’s internal apps (live streaming, messaging, push, cloud control, etc.).
Support multi‑business reuse of a single connection.
Offer clear access procedures and external interfaces.
Functional Implementation
Boundary and Requirements
The service must separate its responsibilities from business logic while satisfying diverse business scenarios such as messaging, live‑streaming and push.
Key requirements include connection establishment/maintenance, upstream request forwarding, and downstream data push.
Supported Scenarios
Messaging: unicast and batch‑unicast for private messages and limited‑size groups.
Live streaming: multicast to millions of viewers.
PUSH: batch‑unicast to a fixed audience.
Architecture Overview
The system consists of four layers: Unified Long‑Connection SDK (client side), Control Layer, Access Layer, and Routing Layer.
SDK
Obtain token, access point and protocol from the control layer.
Establish and maintain the connection, trigger reconnection on failure.
Forward business SDK requests to the service.
Receive data from the service and deliver it to the business SDK.
Control Layer
Generate and verify device tokens.
Distribute appropriate access points based on client attributes.
Apply small‑flow control policies.
Access Layer
Manage connections, connection IDs, and groups.
Forward upstream requests to business back‑ends and write back responses.
Handle downstream push to the appropriate SDK.
Routing Layer
Maintains mapping between device identifiers and connection identifiers for push routing.
Core Process
Connection establishment: SDK obtains token and protocol, then connects to the access layer.
Connection maintenance: periodic heartbeat.
Upstream request: business SDK sends request, access layer forwards to business server.
Downstream push: server pushes via routing layer, access layer writes to the connection, SDK delivers to business SDK.
Performance Optimizations
Multi‑Protocol Support
Connection layer abstracts TCP/TLS, QUIC, WebSocket etc., while session layer handles business logic, allowing seamless protocol upgrades.
Request‑Forwarding and Downstream Task Groups
Separate goroutine pools for different business QPS avoid head‑of‑line blocking and reduce GC pressure.
Deployment
Access points deployed in East, North and South China, plus Hong Kong for overseas.
Clustered deployment with domain‑based traffic splitting.
Instance connection limits (100k‑200k) to improve stability.
Business Integration
Typical steps: evaluate required capabilities, estimate user scale, integrate SDK on client, adapt server interfaces, and request resources.
Summary and Future Plans
The service now supports tens of millions of concurrent connections, million‑level upstream QPS and high‑throughput downstream pushes. Lessons learned emphasize clear requirement boundaries, simple yet robust design, and balanced performance‑vs‑operability trade‑offs. Future work focuses on finer‑grained network metrics, intelligent client‑side adaptation, and broader scenario coverage.
Architecture & Thinking
🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.