Operations 16 min read

Online Load‑Testing Practices for Baidu Nuomi Marketing Activities

This article presents a comprehensive case study of Baidu Nuomi's online load‑testing methodology for high‑traffic marketing events, covering capacity estimation, test planning, execution, anti‑attack measures, platform architecture, and lessons learned to ensure system reliability and performance under peak loads.

Baidu Intelligent Testing

Apr 16, 2018

Online Load‑Testing Practices for Baidu Nuomi Marketing Activities

During large‑scale marketing events, Baidu Nuomi experiences a dramatic increase in PV and UV, which can severely affect user experience, brand reputation, and revenue; the system also faces potential malicious attacks, requiring robust capacity assessment and performance testing in production.

Initial attempts to conduct offline stress tests proved inadequate due to mismatched topology, heterogeneous machine configurations, limited third‑party environments, and the inability to detect cross‑datacenter, load‑balancing, or auto‑scaling issues, prompting a shift to production‑environment testing.

The online testing challenges addressed include estimating activity capacity by converting UV forecasts to QPS, ensuring testability without impacting real users, handling diverse traffic patterns and protocols, and implementing risk‑control mechanisms to monitor system health during tests.

Key prerequisites are an isolated single‑datacenter test loop that does not affect other zones and comprehensive system‑wide anomaly detection to capture full‑stack performance anomalies.

The testing workflow consists of traffic estimation, test plan design, test execution, issue follow‑up, and regression verification, with detailed steps illustrated in the accompanying flow diagram.

Traffic estimation involves translating expected UV into subsystem QPS using daily monitoring data and reverse‑engineering coupling relationships to build accurate capacity models.

Test plans cover four main scenarios: three‑end full‑link testing (PC, WAP, native app), order‑flow testing for transaction‑critical paths, subsystem‑specific testing for special pages, and anti‑attack testing to validate DDoS mitigation strategies.

Execution includes preparation (data mock, scripts, tool configuration) and running large‑scale pressure tests (up to 20k‑50k RPS), with a mathematical model to calculate the required number of load‑generator machines based on target QPS, CPU cores, thread constants, and response time.

Issues discovered during testing are tracked collaboratively across QA, RD, and operations, and regression tests are performed after fixes to confirm that capacity targets are met.

Typical test schemes are detailed: three‑end full‑link testing using log‑driven traffic replay, order testing with passport stubs and data white‑listing, and anti‑attack testing using reconstructed attack traffic profiles.

The Perf testing platform, developed from these practices, provides scalable load generation, configurable metric collection, automated result storage, and API integration, supporting tools such as JMeter and Attila.

Since deployment, the platform has supported over 10 major marketing events, identified 100+ performance issues, reduced manual monitoring effort, improved test accuracy, and facilitated capacity‑elastic scaling across more than 10 product lines.

Common problems uncovered include CPU/memory saturation, mis‑configured timeouts leading to cascading failures, unbalanced instance deployment, and code‑level inefficiencies such as excessive DB/Redis calls.

Authors Peng Yaoming and Tang Huayi are senior testing engineers at Baidu with extensive experience in online performance testing, capacity planning, and large‑scale system reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system reliability capacity planning load testing performance engineering Online Testing

Written by

Baidu Intelligent Testing

Welcome to follow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.