Full-Chain Load Testing Practices for iQIYI Payment System
iQIYI’s payment team built a full‑chain load‑testing framework that isolates data, mocks dependencies, constructs realistic multi‑service traffic, and executes protected tests to expose bottlenecks, guide scaling and optimizations, and ultimately ensure reliable payment services during traffic spikes, while planning a unified automation platform.
iQIYI provides video, sports, live streaming, literature and other services to billions of users. The payment team must ensure stable payment services while handling unpredictable traffic spikes caused by marketing activities across multiple business lines. Accurate capacity assessment and full‑chain load testing are essential to guarantee service reliability.
Full‑chain load testing simulates massive requests in a production‑like environment to evaluate system capacity and guide performance tuning. Prior to adopting full‑chain testing, the team faced several issues:
Complex production traffic made single‑machine test results unreliable for capacity estimation.
Traffic conversion models did not match real user behavior, leading to ineffective mitigation plans.
Shared resources and services could not expose bottlenecks in isolated tests, requiring true peak traffic for validation.
Misaligned link capacities caused overall performance to be limited by weak services and resulted in resource waste.
To address these problems, the team explored and practiced full‑chain load testing centered on the payment scenario.
Problem Exploration and Method Practice
The practice focused on four main aspects:
Core link sorting : Identify the target and branch links for testing, eliminate risks, and define the full‑chain test scope.
Load‑test environment preparation : Ensure test identifiers are propagated and traffic is correctly handled.
Traffic construction : Build realistic traffic models based on real data and business strategies.
Execution and protection : Perform pre‑validation, execute tests according to plan, and monitor/review results.
Core Link Sorting
Multiple business lines were combined to form a unified core link. Each line clarified downstream dependencies and side‑track dependencies. For payment, third‑party channels were mocked to simulate payment requests, success rates, callback delays, and notifications, creating a closed‑loop payment chain.
Side‑track dependencies such as risk control and accounting services were also mocked or degraded as needed.
Load‑Test Environment Preparation
Key considerations included:
Test identifier propagation : Inject a predefined marker into inbound traffic; APM was leveraged to trace and propagate the marker across HTTP, RPC, thread pools, and middleware.
Data isolation : Shadow tables mirroring original schemas were created for relational databases; routing to shadow tables was achieved via ShardingJDBC and MyBatis interceptors.
Message handling : Instead of creating new topics, a test flag was added to message bodies (e.g., RocketMQ UserProperty) and consumers filtered accordingly.
Water‑mark (data volume) : Shadow tables were pre‑filled to match production data volume to avoid misleading performance results.
Similar isolation techniques were applied to Redis (shadow keys or clusters), MongoDB (shadow collections), Elasticsearch (shadow indices), and log directories.
Traffic Construction
Rather than using a single static dataset, the team analyzed production logs to derive realistic traffic distributions. For payment, the model reflected the proportion of payment methods, service call ratios, and adjusted conversion rates based on business strategies, resulting in a multi‑service call mix for checkout, channel notification, order query, etc.
Execution and Protection
Before each test, business and environment validation ensured that side‑track services were degraded and data isolated, preventing impact on normal traffic. Monitoring covered core service call volume, success rate, latency; message backlog, Redis hit rate, DB load; and host‑level metrics. Teams also prepared rate‑limit and degradation plans, reserving safety buffers for regular traffic.
Testing Practice and Results
Two main testing scenarios were executed:
Business capacity verification : Conducted full‑chain tests with live‑streaming and ticket‑sale activities, identified bottleneck services, and applied scaling, async processing, or degradation to meet capacity goals.
System limit testing : Performed virtual‑currency payment tests (no third‑party channels) and mixed‑payment tests. Virtual‑currency tests revealed a Redis‑dependent order‑ID generation issue caused by RDB fork pauses; after disabling RDB and optimizing degradation, TPS approached the system’s theoretical limit. Mixed‑payment tests uncovered mismatched service capacities, which were resolved through scaling and code optimization. Overall, the full‑chain load‑testing capability enabled cross‑team collaboration, systematic planning, and continuous improvement of the payment system’s performance. Future Planning The team aims to consolidate scattered requirements (data models, traffic scheduling, monitoring, performance analysis) into a unified, one‑stop solution to lower entry barriers and implementation costs. Plans include automating multi‑payment‑type traffic generation, providing detailed test reports for upstream services, and further refining the end‑to‑end testing platform.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.