Building a Scalable Distributed WebSocket Cluster Using Spring Cloud & Consistent Hashing

This article analyzes the challenges of multi‑user WebSocket communication in a clustered environment, compares Netty and Spring WebSocket implementations, and presents two practical solutions—session broadcast and a consistent‑hashing based routing scheme—complete with code samples, gateway configuration, and load‑balancing considerations.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Building a Scalable Distributed WebSocket Cluster Using Spring Cloud & Consistent Hashing

When developing a project that requires real‑time communication among many users, the author encountered the problem of WebSocket handshake requests and sharing WebSocket sessions across a cluster.

Scenario : Four servers are involved—one SSL‑enabled gateway server, one Redis + MySQL server, and two application servers forming a cluster. The gateway terminates HTTPS/WSS, while each app server handles both stateless HTTP requests and long‑lived WebSocket connections. Users must be able to send both one‑to‑one and group messages.

Technology stack includes Eureka for service discovery, Redis for session sharing and pub/sub, Spring Boot, Zuul, Spring Cloud Gateway, Spring WebSocket, Ribbon for load balancing, Netty as a low‑level NIO framework, and a consistent‑hashing algorithm for routing.

WebSocketSession vs HttpSession : In Spring’s WebSocket support each connection has a WebSocketSession, which cannot be serialized to Redis, so true session sharing across nodes is impossible. By contrast, HttpSession can be shared using spring-session-data-redis and spring-boot-starter-redis.

Solution evolution :

Using Netty directly: a Netty handler creates a ChannelGroup and broadcasts messages to all channels. Drawbacks include poor integration with Spring Cloud, duplicated business logic, difficulty registering with Eureka, and the need to implement REST endpoints separately.

Using Spring WebSocket: Spring Boot provides seamless integration. The author shows the Maven dependency, a configuration class implementing WebSocketConfigurer, and a message‑handling class extending TextWebSocketHandler. This approach simplifies development and aligns with the rest of the Spring Cloud ecosystem.

After evaluating both, the author chose Spring WebSocket for its convenience and consistency with other services.

From Zuul to Spring Cloud Gateway : Zuul 1.0 does not support WebSocket forwarding, and Zuul 2.0, although it adds WS support, is not integrated into Spring Boot. Therefore the gateway is migrated to Spring Cloud Gateway. Essential SSL termination and routing settings are provided in a YAML snippet (port 443, keystore configuration, service discovery, and route definitions).

Session broadcast solution : The gateway receives a teacher’s broadcast request, retrieves the IP list of all cluster nodes via Eureka, and forwards the request to each node. Each node checks its local session‑to‑user map and sends the message if a matching session exists. This method is simple but wastes CPU cycles when many nodes have no relevant sessions.

Consistent‑hashing solution :

Build a hash ring where each physical node may have multiple virtual nodes.

When a node goes down, remove its real and virtual nodes from the ring; when a node comes up, add them back.

Store the ring in Redis and use Redis pub/sub to push updates to all gateways, avoiding frequent reads.

Clients first send an HTTP request containing their user‑id; the gateway hashes the id, looks up the target IP on the ring, and returns it. The client then opens the WebSocket connection to that specific server.

Handling node state changes:

Node down : Detect via Eureka events, delete the node’s entries from the ring, and prevent further routing to it.

Node up : Update the ring and either force all clients to reconnect (simple but disruptive) or selectively disconnect sessions that would be misrouted (more complex).

Load‑balancing with Ribbon : The author attempted to customize Ribbon by extending AbstractLoadBalancerRule to hash on user‑id, but encountered two issues: (1) the custom rule caused cross‑service request mixing, and (2) Ribbon’s choose method does not expose a key parameter, forcing the use of a default key. As a temporary workaround, the client performs a separate HTTP request to obtain the target IP before establishing the WebSocket connection.

Summary : Two viable approaches are presented for multi‑user WebSocket communication in a cluster. The first, session broadcast, is easy to implement but inefficient under high concurrency. The second, a consistent‑hashing based routing scheme, offers better scalability at the cost of added complexity in hash‑ring management and load‑balancer customization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsload balancingNettyWebSocketgatewaySpring Cloudconsistent hashing
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.