How iQIYI Scaled Its Payment System with Full‑Link Load Testing
This article details iQIYI's end‑to‑end load‑testing methodology for its payment platform, covering problem identification, core‑link mapping, environment setup, realistic traffic modeling, execution safeguards, results from capacity verification and stress testing, and future plans for a unified testing solution.
Problem Background
iQIYI serves billions of video users daily and also offers sports, live streaming, and literature services, generating massive traffic that can spike unpredictably during marketing activities. The payment team must ensure stable, accurate payment processing while handling sudden traffic surges, making precise capacity assessment and planning essential.
Before adopting full‑link load testing, the team faced several issues:
Complex production traffic made single‑machine tests ineffective for capacity estimation.
Traffic conversion models did not match real user behavior, leading to inaccurate plans.
Shared resources rarely exposed bottlenecks in isolated tests.
Misaligned link capacity caused overall system constraints and resource waste.
Exploration and Methodology
The team adopted a four‑step practice:
Core link sorting : Identify primary and branch links across business lines, mock external services, and handle side dependencies. For example, a ticket‑booking scenario involved six core services (login, activity, ticketing, checkout, payment, notification) with side services like risk control and recommendation mocked or degraded.
Environment preparation : Ensure test identifiers propagate through the entire stack, isolate test data using shadow tables (via ShardingJDBC or MyBatis), tag messages (e.g., RocketMQ UserProperty), and create shadow resources for Redis, MongoDB, Elasticsearch, and log directories. Images illustrate the prepared environment.
Traffic construction : Build realistic traffic models by analyzing production logs, determining the proportion of payment methods, service call ratios, and adjusting user purchase indices to reflect business strategies.
Execution and protection : Conduct pre‑validation, monitor key metrics (service call volume, success rate, latency; message backlog, Redis hit rate, DB load; machine health), and enforce degradation and safety buffers to prevent impact on live traffic.
Practice and Results
The full‑link testing was applied in two main dimensions:
Business capacity verification : Teams from payment, live streaming, and ticketing performed load tests during marketing and flash‑sale events, using mocked services to simulate realistic delays and success rates. The tests identified short‑board services, which were then scaled, async‑ified, or degraded to meet target capacity and validate rate‑limiting and fallback strategies.
System limit testing : Two scenarios were explored:
• Virtual‑currency stress test : Isolated the payment system without third‑party channels, discovered that order‑ID generation via Redis caused latency spikes due to RDB fork pauses; after disabling RDB and optimizing degradation, the system approached its design TPS limit.
• Mixed‑mode stress test : Combined multiple payment methods in realistic ratios, uncovered mismatched service capacities, and resolved them through horizontal scaling and code optimization.
Each test required thorough pre‑planning (objectives, metrics, traffic strategy, contingency) and post‑mortem analysis to ensure controlled, repeatable execution.
Future Planning
The team aims to consolidate these practices into a one‑stop solution that lowers entry barriers and implementation costs, automate traffic diversification, and provide detailed test reports for upstream services and accounting teams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
