Inside WeChat Pay: Scaling MySQL for Millions of Payments per Second
Zhou Tang, head of WeChat Pay operations at Tencent, shares how his team built a massive MySQL‑based payment platform handling up to 150 k transactions per second, covering background, DB‑CMDB design, change management, monitoring, security, high availability, and why Golang became their core development language.
Background
This diagram from the 2016 Internet Report shows that WeChat Pay leads user payment frequency, with over 50 payments per month per user, surpassing even U.S. debit cards. During the 2016 Lunar New Year, WeChat Pay processed 80 billion RMB, peaking at 150 k transactions per second and 500 k red‑packet splits per second, making it one of the world’s largest payment and settlement systems.
The underlying infrastructure uses a PC + MySQL architecture rather than traditional mainframes. Hardware failure rates are around 2%, and the open‑source MySQL instances are customized without commercial support, relying on the internal team for troubleshooting and optimization. The scale includes over a thousand servers, more than a thousand MySQL instances, and 500+ database services managed by only three DBAs, creating significant pressure.
Key goals under this background are:
High performance : Achieve million‑level TPS at the database layer.
High reliability : Ensure data consistency to avoid payment errors and compensation.
High availability : Prevent user‑visible failures and handle complaints promptly.
Security : Protect sensitive payment data from internal and external threats.
DBCMDB
DBCMDB is considered the foundation of database operations management and extends the traditional CMDB concept into three layers:
Basic CMDB – stores IDC fundamentals such as hardware, IP, and physical location.
Operations CMDB – contains program, port, task, and service information for logical deployment and scaling.
DBCMDB – records business‑specific database instances, ports, schemas, and tables.
Only with this information can effective DB operation management be performed.
Why build a DBCMDB?
Rapid business growth leads to thousands of servers and complex configurations (e.g., multiple instances per machine, thousands of tables with varying sharding rules). Manual management cannot keep up, making it difficult to locate the correct machine, instance, or table for schema changes, and to synchronize adjustments, monitoring, and backups.
How to implement DBCMDB?
Core configuration items such as IP, port, business, owner, table count, and master‑slave relationships are stored, and a user‑friendly web UI is provided for management. Screenshots illustrate the relationship model, physical deployment tree, and per‑business table view.
What can DBCMDB achieve?
With DBCMDB, a complete DB operation toolchain can be built, including automated deployment, change management, and monitoring. Online services retrieve DB connection configurations directly from this source.
DB Changes
Change types are numerous—switches, failures, scaling, etc. Mobile payments iterate dozens to hundreds of times daily, far more frequently than traditional banking changes, demanding bank‑level stability under rapid iteration.
Early DB changes
When manpower was scarce, developers directly modified databases, leading to inconsistent practices, lack of standards, and frequent faults such as accidental data loss or performance degradation.
Mid‑term DB changes
A demand system introduced DBA approval, improving professionalism and stability, but manual execution remained time‑consuming and often required overnight work.
Current DB changes
A self‑service change system built on DBCMDB allows developers to submit tickets with selected business, instance, and table information plus the SQL statement. DBAs evaluate the SQL, create an execution plan, and schedule changes during low‑traffic periods. The system controls gray‑scale rollout, concurrency, and intervals, enabling unattended overnight changes with precise impact control.
DB Monitoring
Early monitoring
Initially Nagios and Zabbix were used, providing comprehensive metrics but struggling with scale (thousands of instances) and configuration drift, leading to missed alerts and incidents.
Current monitoring
A custom agent‑based monitoring platform, integrated with DBCMDB, automatically adapts to deployment changes. Alert policies are defined via templates applied to business groups, and an event‑tracking system ensures timely response (e.g., five‑minute response for critical alerts).
Data Security
Security measures include jump‑host‑only login, full audit of DBA actions, real‑time illegal‑connection alerts, fine‑grained on‑demand authorization, protection of critical fields (e.g., account balance), encrypted backups, and strict access controls to prevent data exfiltration.
Authorization system
Authorization is managed at the business level, automatically mapping to IPs via CMDB data. Integration with the release system enables automatic revocation of permissions after deployment, eliminating manual errors and stale privileges.
High Availability
Database HA
Master‑slave automatic failover is achieved with HAProxy TCP forwarding and per‑node agents monitoring master health. When all agents detect a master failure, etcd decides the switch, and agents adjust topology. The solution supports three‑node read/write and multi‑region disaster recovery. Additionally, PhxSQL provides a strongly consistent queue for SQL synchronization, enhancing consistency and fast failover.
Business HA
A “jump‑order” scheme for ID generation ensures that if a DB group fails, the business layer skips the faulty segment and continues processing new orders, minimizing impact on user experience.
Golang
Golang, developed by Google, offers concise syntax, high performance, built‑in concurrency, automatic garbage collection, static compilation, and cross‑platform binaries. It powers many of the team’s tools (e.g., external network monitoring, DB monitoring, rapid DB switching) and is favored for distributed backend services over Java or C/C++ due to its ease of deployment and comparable performance.
Q&A
Q: How does Golang achieve better cross‑platform support than Java? A: Java compiles to bytecode executed by a JVM, requiring platform‑specific JVM implementations, whereas Go compiles directly to native machine code for each target platform, making deployment simpler.
Q: How to handle stale permissions after business decommission? A: Permissions are granted at the business level, not per‑IP; when deployment changes, the system recalculates and revokes unnecessary permissions automatically.
Q: Which IDEs are recommended for Golang development? A: Any text editor works, but popular IDEs include Vim, Eclipse, Visual Studio, and the dedicated LiteIDE, which runs on both Windows and Linux.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.