Backend Development 23 min read

Design Principles and Architecture of Large‑Scale Instant Messaging Systems

This article explores core concepts, ID design, read/write fan-out, push‑pull models, and industry implementations for large‑scale instant messaging systems, discussing trade‑offs in message diffusion, unique identifier strategies, real‑time delivery, ordering, unread counts, multi‑device sync, and deployment considerations.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Design Principles and Architecture of Large‑Scale Instant Messaging Systems

Core Concepts of IM Systems

The article defines key entities such as users, messages (text, emoji, image, video, file), conversations, groups, terminals (Android, iOS, Web), unread counts, user status, relationship chains, single‑chat, group‑chat, and customer‑service scenarios.

Read‑Fan‑Out vs Write‑Fan‑Out

Read‑fan‑out (pull) requires each user to read from every mailbox that contains new messages, making read operations heavy but write operations lightweight. Advantages include simple write logic and natural per‑conversation history; the drawback is heavy read load.

Write‑fan‑out (push) stores a single copy of a message in the sender’s mailbox and replicates it to each recipient’s mailbox. Writes become heavy, especially for group chats, while reads are lightweight and multi‑device synchronization is easier.

In feed systems these patterns are also called fan‑in/fan‑out.

Unique ID Design

Common ID generation strategies include UUID, Snowflake, DB‑step allocation, Redis/DB auto‑increment, and custom rules. In IM, IDs are needed for conversations and messages.

Message ID considerations :

Prefer monotonic increasing IDs for better storage locality.

Global incremental (e.g., Snowflake) works for write‑fan‑out.

User‑level incremental suits single‑chat scenarios.

Conversation‑level incremental (often continuous) helps detect lost messages in read‑fan‑out.

Conversation ID generation can be done by concatenating user IDs:

conversation_id = ${from_user_id} << 32 | ${to_user_id}

If user IDs exceed 32 bits, a string concatenation is required, which is less efficient. An alternative is a global incremental ID with a mapping table linking from_user_id , to_user_id , and conversation_id .

Push, Pull, and Hybrid Models

Three ways to deliver new messages:

Push : Server actively pushes new messages to all client endpoints.

Pull : Clients poll for messages, typically used for historical retrieval.

Push‑Pull hybrid : Server pushes a notification; the client then pulls the actual message list, optionally with periodic pulls for reliability.

The hybrid model mitigates message loss in pure push scenarios and works best with write‑fan‑out.

Industry Solutions

WeChat uses write‑fan‑out combined with a push‑pull hybrid, limits group size to 500, and employs multi‑data‑center architecture with Paxos for consistency. Its ID generator is based on DB‑step allocation with user‑level incremental IDs.

DingTalk started with write‑fan‑out and later shifted to read‑fan‑out for massive groups, leveraging Tablestore’s primary‑key auto‑increment to achieve user‑level incremental IDs.

Twitter (a feed system) popularized Snowflake for globally incremental IDs, initially using write‑fan‑out and later a hybrid of write‑ and read‑fan‑out for high‑profile users.

Ensuring Real‑Time Delivery and Ordering

Transport choices include TCP sockets, UDP sockets, and HTTP long‑polling. Message ordering issues arise from load‑balanced HTTP, sharding strategies, and asynchronous processing. Solutions involve long‑connection heartbeats, front‑end sequence IDs, and careful sharding (prefer from_user_id ).

User Online Status

Online status can be stored in Redis (or Redis Cluster/Codis) or via distributed consistent hashing. Heartbeat updates are recommended to keep status accurate despite server or network failures.

Multi‑Device Synchronization

In read‑fan‑out, clients pull the latest conversation list and compare last message IDs to detect gaps, often using Redis AOF for high‑throughput storage. In write‑fan‑out, clients track a sync cursor and request messages after that point.

Unread Count Management

Read‑fan‑out keeps unread counters in the backend, requiring atomic updates via Redis transactions or Lua scripts. Write‑fan‑out may offload unread counting to the client, risking inconsistency.

Historical Message Storage and Hot‑Warm‑Cold Separation

Read‑fan‑out stores a single copy per conversation; write‑fan‑out stores both per‑user and per‑conversation timelines. Older messages are migrated from hot (Redis) to warm and cold storage using an HWC architecture.

Access Layer Design

Load balancing options include hardware LBs (F5, A10), DNS‑based LB, DNS + 4‑layer + 7‑layer (e.g., DNS + DPVS + Nginx), and DNS + 4‑layer for long‑connection stability. A scheduling service can add gray‑release, proximity, or least‑connection routing.

Architectural Takeaways

Key practices: extensive gray‑release testing, monitoring, alerting, caching, rate‑limiting, circuit‑breaking, low coupling, stateless services, regular evaluation, and thorough performance testing.

Real-time Messagingbackend architecturescalabilityInstant MessagingID generationmessage fan-out
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.