How Tencent Scales Its Services for Chinese New Year: Inside Cloud Load‑Testing Strategies
This article details Tencent's cloud load‑testing approach for handling massive traffic spikes during Chinese New Year, covering background challenges, model selection, script authoring options, data construction, report analysis, and real‑world case studies that demonstrate capacity planning and performance optimization.
Background and Challenges
During Chinese New Year, Tencent services experience traffic spikes up to five‑to‑ten times normal, especially QQ and video streaming. Traditional monitoring is passive; cloud load testing provides a proactive way to discover bottlenecks, verify capacity, and ensure service stability.
Solution Overview
The cloud load‑testing platform offers end‑to‑end capabilities: model selection, test‑case authoring, test‑data construction, and report analysis.
2.1 Load‑test Model Selection
Two modes are supported: concurrent‑user (VU) mode and request‑per‑second (RPS) mode. RPS is derived from VU × response time (Little’s law). In the linear region latency stays stable while throughput rises; beyond saturation latency grows sharply and throughput drops.
2.2 Test Case Authoring
Three scripting options are provided:
JS script mode – high‑level language, easy to compose but requires JavaScript familiarity and incurs extra adaptation work.
Go plugin mode – native Go, hot‑loadable, low overhead, better for complex protocols and high concurrency.
Low‑code / JMeter GUI mode – drag‑and‑drop interface for non‑developers, supports HAR→JS conversion and JMeter extensions.
Example JS script:
// Send a http get request
import http from 'pts/http';
import { check, sleep } from 'pts';
export default function () {
const resp1 = http.get('http://httpbin.org/get');
console.log(resp1.body);
console.log(resp1.json());
check('status is 200', () => resp1.statusCode === 200);
}Example Go plugin snippet:
var Init = plugin.Init
func Run(ctx context.Context) error {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://httpbin.org/get", nil)
if err != nil { return err }
resp, err := http.DefaultClient.Do(req)
if err != nil { return err }
defer resp.Body.Close()
assert.True(ctx, "status code is 200", func() bool { return resp.StatusCode == http.StatusOK })
return nil
}2.3 Test Data Construction
The platform provides traffic recording → test‑case conversion, CSV merge, and automatic sharding across load‑generator pods to avoid data duplication and GC pressure. Recorded binary packets are transformed into a cloud‑compatible archive format.
2.4 Report Analysis
Unified observability is built on OpenTelemetry, delivering metrics, traces, and logs. Users can filter by custom status codes, drill into error logs, and perform trace‑based traffic coloring. Sampling strategies balance gauge, counter, and histogram metrics while limiting log volume.
Practical Cases
3.1 Hand‑Q Spring Protection
Spring‑time peaks on read/write paths required a new Go‑plugin test suite. Switching from JS to Go increased 1000‑concurrency throughput by roughly 90% and reduced memory pressure, enabling scaling to 100 k concurrent users across Shanghai, Nanjing, Guangzhou, etc.
Outcomes: early detection of overload, refined retry/timeout policies, and successful capacity expansion.
3.2 Video Service Disaster‑Recovery Drill
Chaos‑engineering style drills validated fallback, rate‑limiting, and circuit‑breaker mechanisms. Integrated SLA monitoring and automatic traffic degradation ensured the system could sustain 10 k RPS per region without cascading failures.
Summary and Outlook
The cloud load‑testing platform now supports HTTP, gRPC, WebSocket, and custom protocols via Go plugins. Future work includes tighter server‑side metric integration, automated capacity‑estimation, and AI‑driven scenario generation to further reduce manual effort and improve test reliability.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
