WeChat Backend Architecture: Synchronization Protocol, RPC Framework, and Multi-IDC Design
The article outlines WeChat’s backend architecture, detailing extreme business requirements such as low latency and power efficiency, challenges of synchronizing diverse data across terminals, and solutions including a minimal sync protocol, high‑efficiency notification mechanisms, a three‑tier backend, unified RPC framework, coroutine‑based high‑concurrency RPC, and multi‑IDC distribution with strong consistency and disaster‑recovery strategies.
Problem
Extreme Business Features
Smooth message sending and receiving
Timely notifications
Power saving
Bandwidth saving
Thin client
Challenging Backend‑Terminal Synchronization
Synchronizing diverse data: account info, contacts, messages, Moments, etc.
Timely notification and sync
Reliable sync over mobile networks
Saving bandwidth and power
Solution
Minimal Synchronization Protocol
The backend and terminal only need to exchange a single number, allowing the backend to know all data missing on the terminal.
Change sequence number / version number:
Each change to a user's data is assigned a monotonically increasing global sequence number.
Every data batch sent from the backend includes the maximum sequence number of that batch.
The terminal includes the highest sequence number it has already received in each request.
Efficient Notification Mechanism
iOS Apple Push Notification Service
Android and others – long connections
GPRS/EDGE signaling storm optimization
Adaptive heartbeat interval adjustment
Three‑Layer Backend Architecture
Unified RPC Framework
Generate server and client code from Protocol Buffer definitions
Server: developers implement the defined interfaces
Client: applications call the generated client APIs locally
Hide network details
Support TCP/UDP based calls
Support long and short connections
Rich features
Sharding‑based SET distribution
Stateless storage using consistent hashing
Transparent service redirection
Comprehensive automated monitoring (QPS, response time, queue time, per‑interface call frequency and status code distribution, service call topology)
High‑Concurrency Coroutine RPC
Server‑side synchronous call model is easier to learn, use, and debug than an asynchronous model, but the number of processes and threads a single server can host is limited.
RPC based on user‑space threads (coroutines)
A single machine can support tens of thousands to a hundred thousand user‑space threads, limited only by CPU and memory.
Improves concurrency and performance.
Implementation of user‑space thread RPC
Based on makecontext / getcontext / swapcontext
Hook network calls: read / write / epoll
User‑space thread scheduling
Near‑by Access
Access IDC that is geographically close
Near‑by network entry covering major carriers
CDN for image upload/download
Tencent self‑built CDN
AKAMAI
Multi‑IDC Distribution Improves User Experience
Complex domestic network environment
Over 100 million overseas users distributed globally, facing diverse network conditions
Each IDC provides full functionality and all required data
Common data across IDC and independent data per IDC
Globally consistent account information
User data isolated per IDC (a user belongs to one IDC; user attributes, relationship graph, messages; selectively shared SNS data such as photos, comments, likes to reduce bandwidth)
IDC Distributed Data High‑Reliability Final Consistency Guarantee
Primary‑backup model for account and SNS data
The IDC where the user resides is the primary IDC
All other IDC act as backups for that user
Updates propagate from primary to backups
Weak real‑time cross‑IDC updates use a Zookeeper‑mediated primary‑backup task queue
Consolidate cross‑IDC access interfaces
Redo mechanisms ensure reliable cross‑IDC updates
Data sequence numbers guarantee eventual consistency during redo
Relationship‑graph cross‑IDC updates
Privacy control requires real‑time updates
Direct cross‑IDC network calls
Backend batch processing retries failed requests
Fault Tolerance and Disaster Recovery Mechanisms
Single IDC
Users are distributed by SET; each SET is independent
…
High‑availability remote disaster recovery
Each service’s primary IDC has a disaster‑recovery IDC
Challenges: seamless client connection during primary‑backup IDC switch and data consistency between them
Source: http://blog.xiayf.cn/2013/10/23/learning-in-tencent-backend-arch-of-weixin/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
