How to Build a Scalable WebSocket Long‑Connection Gateway with Netty
This article explains the challenges of server‑push in HTTP, reviews WebSocket as the mainstream solution, and details the design, implementation, session management, monitoring, and performance testing of a Netty‑based distributed WebSocket long‑connection gateway used at iQIYI.
Background
HTTP is a stateless request/response protocol based on TCP; push scenarios such as real‑time notifications require the server to initiate data transfer. Short‑polling and long‑polling can partially solve the problem but suffer from latency and resource waste. The HTML5 WebSocket specification became the mainstream solution for server push.
iQIYI Use Cases
The iQIYI platform uses WebSocket for user comments, real‑time identity verification, live‑body recognition and other scenarios that need immediate data synchronization to the browser.
Problems with Existing Implementations
Current WebSocket implementations suffer from inconsistent technology stacks, tight coupling with business systems, lack of session sharing across nodes, single‑node deployment limits scalability, and insufficient monitoring and alerting.
Design of a Unified WebSocket Long‑Connection Gateway
Key Features
Centralized long‑connection management and push capability.
Decoupled from business logic.
Simple HTTP push interface for any language.
Distributed architecture supporting horizontal scaling and high availability.
Multi‑device message synchronization.
Multi‑dimensional monitoring and alerting.
Technology Choice
Netty was selected for its high performance, event‑driven, asynchronous, non‑blocking I/O model and strong community support.
Session Sharing Solution
Two approaches were considered for session sharing in a cluster: a registration‑center mapping and an event‑broadcast mechanism. The lightweight event‑broadcast solution was chosen, with implementations based on RocketMQ, Redis Pub/Sub, or ZooKeeper. RocketMQ was finally adopted for its high throughput, reliability and ease of integration.
System Architecture
The overall architecture of the gateway is shown in Figure 1.
Session Management
The SessionManager component maintains a hash table mapping user IDs to UserSession objects. Each UserSession can hold multiple ChannelSession objects (one per connection). When the number of channels for a user exceeds a configured limit, the oldest channel is closed to conserve resources. The relationship among SessionManager, UserSession and ChannelSession is illustrated in Figure 2.
Monitoring and Alerting
Metrics such as connection count, user count, JVM, CPU and memory are exposed via Micrometer, collected by Prometheus, and visualized in Grafana. Alert rules are configured in Grafana and trigger the internal alarm platform when anomalies are detected.
Performance Testing
Two 4‑core 16 GB virtual machines were used as server and client. The gateway opened 20 ports and each of 20 clients established 50 k connections, achieving up to one million concurrent connections (Figure 3). Sending a single message to all connections took about 10 seconds (Figure 4). With 10 connections per user, 600 concurrent requests for 120 seconds yielded a TPS of over 1600 (Figure 5).
Business Case: Image Filter Notification
When a creator uploads a video cover, an asynchronous task applies filter effects. Once processing completes, the results are pushed to the browser via the gateway, as illustrated in Figure 6.
Benefits
Integrating the gateway reduces development time from days to minutes, improves code maintainability, and lowers operational costs while providing reliable push capabilities.
Conclusion
WebSocket remains the primary technology for server push. A dedicated long‑connection gateway abstracts communication details, enables horizontal scaling, and offers built‑in monitoring, making it a valuable component for modern services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
