Backend Development 17 min read

Design and Implementation of an Online Customer Service Instant Messaging System

This article details the design and implementation of an online customer service instant messaging system, covering requirements analysis, client‑server network model, HTTP and WebSocket protocols, distributed architecture choices such as routing rules, Redis shared memory, server‑master sync, and message‑queue broadcasting, and explains why Netty was selected as the development framework.

Yang Money Pot Technology Team

Feb 21, 2022

Design and Implementation of an Online Customer Service Instant Messaging System

1. Overview

The company operates both telephone and online customer service. The telephone service is provided by the YTalk platform, which has been in production since 2020. Online service currently relies on a third‑party provider, increasing cost and limiting customization. To address these issues, an in‑house online customer service system was launched on June 20, 2021, and continuously enriched thereafter.

The core of the online service is an instant messaging (IM) system that enables real‑time text, file, and voice communication. This article shares the design considerations, technology choices, and pitfalls encountered while building the IM system, aiming to deepen readers' understanding and provide practical guidance.

2. System Design

2.1 Requirements Analysis

The IM system must support two user groups: customer service agents and customers (who may connect via web, app, or WeChat H5). Chat sessions are initiated by customers and assigned to an available agent. Agents can query the full chat history of any user. Voice and video are not required, and direct peer‑to‑peer chats between customers are prohibited.

2.2 Network Model

Client-Server

: Clients communicate through a server that forwards messages. The server holds all connection information, simplifying monitoring but making the server's maximum connection count a performance bottleneck. Peer to Peer: Direct client‑to‑client connections provide privacy and unlimited scaling but require NAT traversal and are unsuitable for the limited, agent‑centric communication pattern of a customer service system.

Given the constraints, the Client-Server model was chosen.

2.3 Application Layer Protocol

Both HTTP long‑polling and WebSocket can deliver messages, but WebSocket offers full‑duplex, persistent connections after a single handshake, making it a better fit for IM.

2.4 Distributed Architecture

To avoid a single‑point failure and support scaling, several distributed solutions were evaluated.

2.4.1 Modulo Routing Rule

Clients are assigned to a server based on a simple modulo of their identifier. This approach is easy to implement and has low overhead, but it suffers from load imbalance and requires all clients to reconnect when server count changes.

2.4.2 Redis Shared Memory

A central Redis store maintains a clientId:serverId mapping. When a client connects, the server writes the mapping; on disconnect, it deletes it. Because Redis transactions are not fully atomic, Lua scripts or explicit locking are needed to avoid race conditions during rapid reconnects.

delete:
    if(delete.id == old.id) {
        del(delete.id); // ensure T2 delete does not overwrite T3 store
    }
store:
    if(newConnection.time >= old.time) {
        store(newConnection.id, newConnection); // ensure T1 store does not overwrite T3 store
    }

This method introduces latency in routing‑table updates, which can cause temporary message delivery failures.

2.4.3 Server‑Master Synchronization

Each server stores its local connections and synchronizes them to a designated master server, which holds the global routing table. The master forwards messages to the appropriate server. A single master simplifies consistency, but introduces a single point of failure; therefore a master‑backup pair or a distributed consensus protocol is required for high availability.

2.4.4 Broadcast Strategy via Message Queue

Instead of selecting a target server, every server publishes incoming messages to a message‑queue topic. All servers consume the topic and deliver the message locally if they hold the corresponding client connection. This eliminates the need for a global routing table and simplifies failure handling, though the message queue can become a bottleneck under extreme load.

// Server receives a message
public void receive(Message message) {
    mqTopic.send(message);
}
// Server listens to the queue
mqTopic.addListener(message -> sendToClient(message));
public void sendToClient(Message message) {
    Channel channel = map.get(message.getUserId());
    if (channel != null) {
        channel.write(message);
    }
}

The broadcast approach was chosen as the most suitable for the online customer service system's traffic profile.

3. Development Framework

Java offers three I/O models: BIO (blocking), NIO (non‑blocking), and AIO (asynchronous). BIO provides low performance, and AIO lacks mature Linux support, so NIO was selected.

Netty abstracts Java NIO, handling connection lifecycle, half‑packet processing, reconnection, congestion control, and more. It supports WebSocket out of the box, aligns with the company's Java stack, and offers high performance through a reactor thread model, zero‑copy, and memory pooling. Consequently, Netty was adopted as the primary framework for the IM system.

4. Conclusion

System design involves evaluating multiple architectural options and balancing business requirements with technical trade‑offs. This article presented the reasoning behind the chosen network model, protocol, distributed routing strategy, and development framework for an online customer service IM system, providing insights that can guide similar projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Backend Development Redis Netty WebSocket Message Queue Instant Messaging

Written by

Yang Money Pot Technology Team

Enhancing service efficiency with technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.