How LeTV Scaled Its Payment System to 100k Orders per Second
LeTV upgraded its payment architecture in 2015 by sharding databases, designing a Snowflake‑based globally unique order ID, implementing asynchronous replication for eventual consistency, building high‑availability master‑slave clusters with LVS and KeepAlive, tiering data caches, and adding a coarse‑fine traffic pipeline to reliably handle up to 100,000 orders per second.
Database Sharding
To achieve 100,000 orders per second, the order table is split across multiple databases and tables. The user ID ( uid) is used as the sharding key. A binary‑tree expansion strategy doubles the number of databases each time (1 → 2 → 4 → 8 …). Eight databases are deployed; each database contains ten tables named order_0 … order_9, yielding 80 shards in total.
Sharding formulas:
Database number = (uid / 10) % 8 + 1
Table number = uid % 10Example: uid = 9527 maps to database 1, table order_7.
Order ID Generation
Auto‑increment IDs cannot sustain the required throughput, so a Snowflake‑style algorithm (in‑memory) is used to generate globally unique IDs. The simplified ID consists of:
Timestamp – millisecond precision obtained via System.currentTimeMillis().
Machine ID – a unique identifier assigned to each order‑processing server.
Sequence – a per‑millisecond counter that resets each millisecond.
To enable direct lookup without a uid, the sharding information (database and table numbers) is embedded in the ID. A two‑character sharding segment (e.g., "17" for DB 1‑table 7) and a version byte are prefixed to the Snowflake payload.
Eventual Consistency
In addition to uid -based queries, orders must be retrievable by business line ID ( bid). A second cluster sharded by bid is kept as a replica of the uid -sharded cluster. Writes are applied to the primary cluster and propagated asynchronously via a message queue. A monitoring service continuously compares the two clusters and triggers corrective synchronization when discrepancies are detected.
Database High Availability
The system assumes any database node may fail. A classic master‑slave topology (one master, two slaves) is combined with LVS load balancers for read‑only slaves and a KeepAlive virtual IP for the master. Automated scripts detect master failure, promote a backup slave to master, and reassign the virtual IP, achieving recovery within seconds.
Data Tiering
To reduce load on the core databases, data is classified into three tiers:
Tier 1 : Order and transaction data – accessed directly from the database without caching.
Tier 2 : User‑related data – cached in Redis.
Tier 3 : Configuration data – cached in local memory; updates are pushed via a high‑availability message‑push platform.
Coarse‑Fine Traffic Pipeline
A “coarse‑fine” pipeline protects the backend from traffic spikes. The coarse side caps inbound traffic at 1 M RPS and discards excess requests. The fine side throttles traffic to the web cluster at 100 k RPS, queuing the surplus. The pipeline is implemented with the commercial version of Nginx using the max_conns directive; the value must be tuned based on expected TPS and average response time.
Supporting Tools
The open‑source data‑access framework Mango (mango.jfaster.org) natively supports the described sharding strategy. Source code: https://github.com/jfaster/mango.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
