WeChat Backend Architecture: High Availability, Strong Consistency, and Scalable Microservices
This article summarizes the design of WeChat's massive‑scale backend, covering its evolution from early storage systems to a multi‑master PaxosStore architecture that delivers six‑nine availability, strong data consistency, rapid iteration, and a unified microservice framework for billions of daily operations.
WeChat's backend supports a wide range of services such as instant messaging, social networking, and financial payments, serving over one billion active users and handling billions of requests per day.
The architecture has evolved from a first‑generation storage system in 2011 to a custom, highly available storage and computation framework called PaxosStore, which separates consensus, compute, and storage layers and supports both multi‑master and traditional failover designs.
Four major challenges drive the design: massive storage with fault tolerance, strong consistency for ten‑billion‑user data, handling traffic spikes (e.g., holidays and viral events), and supporting over one hundred billion data accesses per minute.
To meet these goals, WeChat targets five‑nine availability for financial services and six‑nine overall availability, employing multi‑master replication, cross‑data‑center load balancing, and rapid failover avoidance.
Key technical components include:
Multi‑master PaxosStore providing high‑throughput, ACID‑compatible storage with support for various data models (key‑value, FIFO, 2‑D tables).
A microservice framework offering service definition, discovery, error retry, monitoring, gray‑release, and configuration management.
Libco, a coroutine library built on epoll and a time‑wheel, enabling synchronous‑style programming with asynchronous execution, reducing complexity compared to traditional event‑driven models.
Practical outcomes show the core data storage achieving six‑nine availability, with the design and algorithms published in a VLDB paper and the PaxosStore code open‑sourced on GitHub.
Overall, the system demonstrates how a large‑scale, high‑availability backend can be built using custom consensus protocols, multi‑master replication, and coroutine‑based concurrency to support continuous, low‑latency service for billions of users.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
