How LeTV Scaled Its Payment System to 100k Orders per Second with Sharding and HA
This article explains how LeTV's payment platform handled a hundred‑fold surge in traffic by redesigning its architecture with database sharding, a custom order‑ID scheme, asynchronous consistency, high‑availability clustering, data tiering, and a coarse‑fine request pipeline to sustain up to 100,000 orders per second.
As hardware‑driven flash sales intensified, LeTV Pay needed to support a hundred‑to‑thousand‑fold increase in request volume, prompting a complete architecture overhaul in November 2015 to reliably process 100,000 orders per second.
1. Sharding (Database Partitioning)
To achieve the required write throughput, the order table is split across 8 databases (DB1‑DB8) and each database contains 10 tables (order_0‑order_9), using the user ID ( uid) as the sharding key. The sharding algorithm is:
Database number = (uid / 10) % 8 + 1
Table number = uid % 10For example, uid=9527 maps to DB1 and table order_7. The system employs a binary‑tree expansion strategy, doubling the number of databases during scaling, which simplifies data synchronization.
2. Order ID Generation
Because generating 100,000 auto‑increment IDs per second via a database is infeasible, a Snowflake‑style algorithm is used to create globally unique IDs in memory. The ID consists of a timestamp (millisecond precision), a machine identifier, and a per‑millisecond sequence number. To enable direct lookup, the sharding information (database and table numbers) is embedded in the ID, and a version byte is reserved for future use.
3. Final Consistency
To support queries by both user ID and merchant ID ( bid), a duplicate order table cluster is maintained for the bid dimension. Asynchronous replication via a message queue ensures eventual consistency without incurring the latency of distributed transactions. Real‑time monitoring detects and reconciles any data drift.
4. Database High Availability
Both master‑slave and read‑only replicas are protected with automatic failover. Load balancers (LVS) route read traffic, while a virtual IP with KeepAlive handles master failover, allowing the system to recover from a primary outage within seconds.
5. Data Tiering
Data is classified into three tiers: (1) core order and transaction data accessed directly from the database, (2) user‑related data cached in Redis, and (3) configuration data cached in local memory with a push‑based update mechanism to keep caches consistent.
6. Coarse‑Fine Pipeline
A coarse‑fine request pipeline, implemented via Nginx commercial features, caps incoming traffic at 1 million requests per second, discarding excess, while allowing only 100,000 requests per second to reach the web cluster, queuing the remainder to prevent overload.
All diagrams referenced in the original article are omitted for brevity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
