Operations 20 min read

Ensuring System Stability for High‑Scale Services: Full‑Link Load Testing at Gaode

The article describes how Gaode handles the challenges of supporting over 100 million daily active users by applying capacity planning, traffic control, disaster recovery, monitoring, rehearsal, and a self‑built full‑link load‑testing platform that simulates realistic traffic, manages resources, and provides detailed reporting to guarantee system stability.

Architecture Digest
Architecture Digest
Architecture Digest
Ensuring System Stability for High‑Scale Services: Full‑Link Load Testing at Gaode

In 2018, Gaode's daily active users (DAU) surpassed 100 million, creating both joy and significant stability challenges for the engineering teams responsible for keeping the service reliable.

Business Scale : Gaode operates thousands of online applications across dozens of data centers and tens of thousands of machines, forming a highly complex core service graph.

Stability Measures : Five fundamental methods are employed:

Capacity Planning : Estimate future traffic based on historical data and calculate required resources. Formula: MachineCount = EstimatedCapacity / SingleMachineCapacity + Buffer .

Traffic Control : Apply rate limiting and service degradation when traffic exceeds design capacity.

Disaster Recovery : Switch traffic to standby data centers during catastrophic failures.

Monitoring : Real‑time, end‑to‑end monitoring with early warning for anomalies.

Rehearsal : Conduct comprehensive pre‑emptive drills using realistic traffic patterns.

Two real‑world incidents (Chinese New Year and May Day) demonstrated that even with thorough planning, unexpected alerts can arise when actual traffic does not match predictions, highlighting the need for continuous simulation and model refinement.

Full‑Link Load Testing : Defined as testing the complete request path from user login to payment, focusing on the top‑down request flow. It requires three key aspects: realistic traffic volume and characteristics, execution in the real production environment, and conducting the test before traffic peaks.

Challenges :

Distributed System Characteristics : Uncertainty, jitter, and queueing behavior cause non‑linear throughput and sudden performance degradation near saturation.

Gaode Business Specifics : Factors such as region, terrain, road conditions, network density, distance, season, weather, and government activities affect navigation services, making simple traffic scaling insufficient.

Why Build a Custom Platform : Existing Alibaba “Amazon” platform could not meet Gaode's cost, flexibility, visualization, and short‑cycle requirements, prompting the development of an in‑house solution.

Platform Design Goals :

Ensure scenario realism (protocol support and user‑behavior modeling).

Isolate test traffic from production users.

Generate ultra‑high traffic using a distributed JMeter cluster.

Reduce usage and resource costs through self‑service, rapid scaling, and efficient data management (OSS for corpora).

Key Features :

Fast test creation (one‑click configuration for simple cases).

Two debugging modes: shield (validation without hitting real services) and service (real request sampling).

Detailed error tracing with request/response logs.

Automatic report generation with QPS, RT, error rates, and baseline comparison.

Architecture : Separate business and technical layers, with visual dashboards, resource scheduling, and integration with continuous delivery pipelines.

Future Work includes automated full‑link monitoring, simplified corpora generation, richer pressure models (step, jitter, pulse), confidence evaluation of test scenarios, and broader support for write‑heavy workloads.

Distributed SystemsSystem Stabilityload testingperformance engineeringfull-link testingGaode
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.