Cloud Computing 23 min read

Cloud Load Testing: Strategies, Scenarios, and Practice Cases for High‑Traffic Events

Tencent’s cloud load‑testing platform simulates massive Chinese‑New‑Year traffic by offering concurrency and RPS modes, multi‑language test authoring, realistic data generation, and unified OpenTelemetry reporting, enabling early bottleneck detection, proactive scaling, and successful high‑load drills such as Mobile QQ and video services.

Tencent Cloud Developer

Feb 3, 2023

Cloud Load Testing: Strategies, Scenarios, and Practice Cases for High‑Traffic Events

During the Chinese New Year period, most of Tencent’s services face a massive traffic surge, with peak loads increasing five to ten times the normal level. To ensure service stability, proactive load testing (cloud‑based pressure testing) is employed to uncover bottlenecks and risks before they impact production.

1. Background and Challenges

The rapid growth of user traffic on applications such as Mobile QQ and video services creates sharp spikes in both read and write paths. Traditional monitoring only reacts to incidents, whereas load testing actively simulates high‑load scenarios to verify throughput, detect capacity limits, and validate fallback strategies.

Key challenges identified:

Most existing load tests are single‑machine, single‑service simulations, which lead to inaccurate capacity estimates.

Micro‑service architectures introduce upstream/downstream dependencies that cannot be fully mocked.

Traffic generation often loses fidelity due to simplified parameter construction.

Data scale and I/O‑intensive back‑ends (MySQL, Redis, Kafka) can cause avalanche effects under high load.

2. Solution Overview

The cloud load testing platform focuses on the pressure‑generation side, providing:

Full‑link capacity assessment for scaling decisions.

Early detection of performance bottlenecks.

Support for both concurrency‑based (VU) and request‑per‑second (RPS) modes.

2.1 Load‑Test Mode Selection

Two primary modes are offered:

Concurrency mode : Simulates a number of virtual users (VUs) that generate requests in parallel.

RPS mode : Directly drives a target request‑per‑second rate; internally it still relies on VUs to generate traffic.

The relationship follows Little’s Law (VU = RPS × RT). As VU increases, response time (RT) stays stable until the system reaches saturation, after which RT spikes and RPS drops.

2.2 Test‑Case Authoring

Three authoring approaches are provided to accommodate different skill sets:

2.2.1 JS Script Mode

// Send a http get request
import http from 'pts/http'; // protocol adapter
import { check, sleep } from 'pts'; // common utilities

export default function () {
    // simple get request
    const resp1 = http.get('http://httpbin.org/get'); // execute request
    console.log(resp1.body); // log for debugging
    console.log(resp1.json()); // parse JSON body
    check('status is 200', () => resp1.statusCode === 200); // assertion
}

This mode offers quick composition using familiar JavaScript syntax but requires developers to learn the platform‑specific APIs.

2.2.2 Go Plugin Mode

var Init = plugin.Init

func Run(ctx context.Context) error {
    req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://httpbin.org/get", nil)
    if err != nil { return err }
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return err }
    defer resp.Body.Close()
    assert.True(ctx, "status code is 200", func() bool { return resp.StatusCode == http.StatusOK })
    return nil
}

Go provides native performance, low‑level control, and direct access to compiled protocol libraries, making it suitable for complex scenarios and large‑scale traffic.

2.2.3 Low‑Code / JMeter GUI Mode

A drag‑and‑drop UI converts HAR recordings into test scripts, supporting HTTP, gRPC, and WebSocket protocols. For legacy JMeter users, the platform can import JMX files and enrich them with unified metrics, traces, and logs.

2.3 Test Data Construction

Data generation is critical to avoid distortion. The platform supports:

Online traffic recording → binary payload conversion.

CSV‑based parameterization with automatic merging and round‑robin distribution across load‑generator pods.

Shard‑based file splitting to ensure each generator works on a unique data slice, minimizing memory copy overhead.

2.4 Report Analysis

Unified observability is achieved via OpenTelemetry, delivering consistent metrics, traces, and logs regardless of the underlying engine. Users can filter by custom status codes, drill down into request logs, and apply trace‑based sampling for detailed root‑cause analysis.

3. Practice Cases

3.1 Mobile QQ Spring‑Festival Activity

Background: QQ experiences massive spikes at midnight on New Year’s Day. Write‑path traffic (feeds, comments, likes) is hard to simulate due to complex data construction.

Solution: Migrated test scripts to Go Plugin, reusing existing protocol libraries. The Go implementation achieved a 90% throughput improvement at 1,000 concurrent users compared with the previous JS approach, reducing hardware cost.

Results:

Detected upstream overload under 6‑8× normal traffic, enabling proactive scaling.

Validated fallback and retry strategies to prevent snowball failures.

Supported multi‑region load tests (Shanghai, Nanjing, Guangzhou) with up to 100 k concurrent users and 1 M RPS, verifying 100 Gbps bandwidth capacity.

3.2 Video Service Disaster‑Recovery Drill

Goal: Verify the homepage’s resilience under various failure scenarios, including chaos‑engineering injections, cache fallback, service degradation, and overload protection.

Outcome: Integrated SLA‑based monitoring, automatic traffic throttling, and circuit‑breaker triggers, ensuring graceful degradation during sustained high load.

4. Summary and Outlook

The cloud load‑testing platform provides end‑to‑end capabilities for high‑traffic event preparation, covering model selection, test‑case authoring, data generation, and comprehensive reporting. Future work will focus on deeper server‑side metric integration, automated performance‑testing pipelines, and AI‑driven bottleneck detection to further reduce manual effort.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JavaScript Microservices Observability load testing cloud testing

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.