WeChat’s Backend Journey: From Zero to Billions with Scalable Architecture
This article chronicles how WeChat’s backend evolved from a simple messaging prototype to a globally distributed, multi‑data‑center system, detailing its message model, unified sync protocol, three‑layer architecture, platformization, disaster‑recovery design, performance tuning, and emerging resource‑scheduling challenges.
From Zero to One
WeChat was officially launched on 2011‑01‑21, just two months after the project started. In that period the team focused on three core tasks: defining the message model, establishing a unified data‑sync protocol, and solidifying the backend architecture.
Message Model
The model mirrors email: messages are stored, forwarded, and temporarily cached on the server before a push notification is sent to the receiver, after which the client pulls the message.
Unified Data‑Sync Protocol
All user data (accounts, contacts, messages) are synchronized via a lightweight snapshot consisting of three key‑value pairs (account, contacts, messages). The server computes the diff and sends only the changes, eliminating the need for client‑side diff computation and reducing traffic and CPU overhead.
Backend Architecture
WeChat adopts a three‑layer architecture: an access layer (long‑ and short‑connection services), a logic layer (business and base services), and a storage layer (data‑access and data‑storage services). Each data type (account, message, contact) has dedicated access and storage modules.
Access layer provides long‑connection (bidirectional) and short‑connection (client‑initiated) services. Logic layer separates business APIs from common base services. Storage layer uses MySQL and the proprietary SDB key‑table system; each data type has its own storage service.
The backend is primarily written in C++ and built on the Svrkit RPC framework, which powers thousands of services and handles tens of trillions of RPC calls daily.
Asynchronous Queues and Group Chat
Features such as group chat and external integrations introduced the need for asynchronous queues to buffer variable processing times. Group messages are written to each member’s inbox (write‑side fan‑out) to keep sync logic simple and efficient.
Micro‑service Evolution (Logicsvr)
Initially a monolithic mmweb CGI host, the system was refactored into multiple Logicsvr services compiled statically with Svrkit, allowing independent deployment and scaling. Today dozens of Logicsvr binaries provide hundreds of CGI APIs across thousands of servers.
Platformization
WeChat’s backend gave rise to separate platforms such as the public account platform, payment platform, and hardware platform, each evolving from specialized handling in the core system.
International Expansion and Multi‑Data‑Center Design
Starting with version 3.0, WeChat added multilingual support and launched its first overseas data center. A master‑master storage architecture was adopted: each user’s data is written to its home data center (master) and asynchronously replicated to the other center, achieving eventual consistency while preserving strong consistency for critical operations like unique WeChat ID allocation.
Three‑Zone Disaster Recovery
After a massive outage in 2013, WeChat redesigned its data center topology to deploy services across three physically isolated zones. Each zone runs a full set of services, and data is replicated with at least two copies across zones, enabling automatic failover without service interruption.
Performance Optimizations
Key improvements include adding coroutine support to Svrkit (allowing asynchronous handling without code changes) and a FastReject QoS mechanism to protect services from overload‑induced cascading failures.
Security Hardening
A ticket‑based authentication system was introduced, where every client request carries a server‑issued ticket that is validated at each backend hop, preventing unauthorized data access.
New Challenges
WeChat is building a resource‑scheduling system (Yard) to automate service deployment and elastic resource allocation, and developing high‑availability storage solutions such as PhxSQL (Paxos‑based MySQL) alongside the existing Quorum‑based KVSvr.
Big Data and Microservices
Focused on big data architecture, AI applications, and cloud‑native microservice practices, we dissect the business logic and implementation paths behind cutting‑edge technologies. No obscure theory—only battle‑tested methodologies: from data platform construction to AI engineering deployment, and from distributed system design to enterprise digital transformation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
