How to Build a Scalable WebSocket Long‑Connection Gateway with Netty

This article explains the challenges of server‑push in HTTP, reviews WebSocket as the mainstream solution, and details the design, implementation, session management, monitoring, and performance testing of a Netty‑based distributed WebSocket long‑connection gateway used at iQIYI.

21CTO
21CTO
21CTO
How to Build a Scalable WebSocket Long‑Connection Gateway with Netty

Background

HTTP is a stateless request/response protocol based on TCP; push scenarios such as real‑time notifications require the server to initiate data transfer. Short‑polling and long‑polling can partially solve the problem but suffer from latency and resource waste. The HTML5 WebSocket specification became the mainstream solution for server push.

iQIYI Use Cases

The iQIYI platform uses WebSocket for user comments, real‑time identity verification, live‑body recognition and other scenarios that need immediate data synchronization to the browser.

Problems with Existing Implementations

Current WebSocket implementations suffer from inconsistent technology stacks, tight coupling with business systems, lack of session sharing across nodes, single‑node deployment limits scalability, and insufficient monitoring and alerting.

Design of a Unified WebSocket Long‑Connection Gateway

Key Features

Centralized long‑connection management and push capability.

Decoupled from business logic.

Simple HTTP push interface for any language.

Distributed architecture supporting horizontal scaling and high availability.

Multi‑device message synchronization.

Multi‑dimensional monitoring and alerting.

Technology Choice

Netty was selected for its high performance, event‑driven, asynchronous, non‑blocking I/O model and strong community support.

Session Sharing Solution

Two approaches were considered for session sharing in a cluster: a registration‑center mapping and an event‑broadcast mechanism. The lightweight event‑broadcast solution was chosen, with implementations based on RocketMQ, Redis Pub/Sub, or ZooKeeper. RocketMQ was finally adopted for its high throughput, reliability and ease of integration.

System Architecture

The overall architecture of the gateway is shown in Figure 1.

WebSocket gateway architecture
WebSocket gateway architecture

Session Management

The SessionManager component maintains a hash table mapping user IDs to UserSession objects. Each UserSession can hold multiple ChannelSession objects (one per connection). When the number of channels for a user exceeds a configured limit, the oldest channel is closed to conserve resources. The relationship among SessionManager, UserSession and ChannelSession is illustrated in Figure 2.

SessionManager component
SessionManager component

Monitoring and Alerting

Metrics such as connection count, user count, JVM, CPU and memory are exposed via Micrometer, collected by Prometheus, and visualized in Grafana. Alert rules are configured in Grafana and trigger the internal alarm platform when anomalies are detected.

Performance Testing

Two 4‑core 16 GB virtual machines were used as server and client. The gateway opened 20 ports and each of 20 clients established 50 k connections, achieving up to one million concurrent connections (Figure 3). Sending a single message to all connections took about 10 seconds (Figure 4). With 10 connections per user, 600 concurrent requests for 120 seconds yielded a TPS of over 1600 (Figure 5).

Million connections
Million connections
Push latency
Push latency
TPS test
TPS test

Business Case: Image Filter Notification

When a creator uploads a video cover, an asynchronous task applies filter effects. Once processing completes, the results are pushed to the browser via the gateway, as illustrated in Figure 6.

Image filter notification
Image filter notification

Benefits

Integrating the gateway reduces development time from days to minutes, improves code maintainability, and lowers operational costs while providing reliable push capabilities.

Conclusion

WebSocket remains the primary technology for server push. A dedicated long‑connection gateway abstracts communication details, enables horizontal scaling, and offers built‑in monitoring, making it a valuable component for modern services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringdistributed architectureNettyWebSocketlong-connection
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.