Backend Development 7 min read

WebSocket Cluster Deployment in a Customer Service IM System

This article explains how a large-scale customer service instant messaging platform uses clustered WebSocket services, Redis, Nginx load balancing, and message‑queue broadcasting to achieve real‑time, reliable message delivery and robust reconnection handling across multiple servers.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
WebSocket Cluster Deployment in a Customer Service IM System

1. Background

Zhuanzhuan is a leading second‑hand trading platform in China with over a hundred million users. When users encounter problems in the Zhuanzhuan app, they can contact customer service via online chat or hotline.

The customer service IM system, developed in‑house, is a crucial tool for communication between users and agents, featuring robot assistants, human agents, session routing, and skill‑group management. This system leverages many open‑source frameworks and middleware; this article explains how WebSocket is applied in the IM system.

2. WebSocket Cluster

2.1 WebSocket Protocol

Instant Messaging (IM) requires real‑time communication. In a support system, users and agents must exchange messages instantly.

WebSocket (WS) is a full‑duplex protocol over a single TCP connection that allows the server to push data to the client, providing low overhead and strong real‑time capabilities. The IM system adopts WS to enable simultaneous sending and receiving of data.

2.2 Clustered Deployment of WebSocket Service

In production we cannot rely on a single WS server; a failure would be catastrophic, so we deploy WS services in a cluster.

Because WS is full‑duplex, multiple machines can handle different users. When user A wants to send a message to user C, the system must locate the appropriate server.

First, Nginx is configured with an ip_hash load‑balancing strategy, ensuring that a user with an unchanged IP is always routed to the same server.

After establishing a WS connection, the server records the user’s UID together with its hostname in Redis.

Each WS instance runs an independent consumer that uses a broadcast model, consuming only messages whose tag matches its hostname.

2.3 Message Sending Flow

User A sends a message to user C; the message reaches WS‑1, which looks up C’s connection info from Redis.

WS‑1 determines that C is connected to WS‑2, sets the MQ message tag to “ws‑2”, and publishes the message.

WS‑2’s consumer receives the tagged MQ message and pushes the message to user C via its WS connection.

2.4 Reconnection Handling

Network conditions can cause disconnections or IP changes, especially when a mobile client switches between 4G and Wi‑Fi, potentially leading to duplicate messages and extra resource consumption.

We mitigate this with two strategies:

Immediate cleanup: when a WS server detects that the stored connection info for a user does not match the current server, it updates Redis and broadcasts an MQ message to close the stale connection on other servers.

Periodic cleanup: the front‑end periodically sends heartbeat messages; the WS server monitors heartbeats and closes abandoned connections.

3. Summary

The article outlines the main flow of message transmission in a customer‑service IM system using WebSocket, emphasizing the need for real‑time, consistent, ordered delivery, as well as reconnection and heartbeat mechanisms that require close collaboration between back‑end and front‑end teams.

Author: Li Shuai, R&D Engineer, Zhuanzhuan Platform.

RedisWebSocketMessage QueueClusterNginxreal-time communicationInstant Messaging
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.