Operations 16 min read

Full-Link Load Testing Platform TestPG: Architecture, Corpus Production, and Intelligent Features

Gaode’s TestPG platform solves full‑link load‑testing bottlenecks by unifying traffic capture with Iflow, converting logs into standardized corpora via a Flink pipeline, and applying corpus‑intelligence that extracts seasonal feature statistics and predicts distributions for precise, feature‑level throttling, enabling faster, more reliable testing and future autonomous optimization.

Amap Tech

Nov 6, 2020

Full-Link Load Testing Platform TestPG: Architecture, Corpus Production, and Intelligent Features

Gaode Map, a national‑level travel service platform with over 100 million daily active users, relies on a massive backend cluster. Failures can have a huge impact, especially given the three‑data‑center, cross‑region deployment that creates complex online environments and links. To avoid user harm, ensure capacity planning, disaster recovery, and rapid incident response, a proactive validation method—full‑link load testing—is required to bring real traffic into the test environment before problems occur.

Full‑link load testing is a crucial means of guaranteeing online service stability for Gaode. The TestPG platform has evolved from its inception to a mature system that supports both routine and full‑link testing, achieving rapid and accurate testing goals. Corpus production (traffic processing) is a key step, and this document focuses on that aspect.

Related article: Gaode Full‑Link Load Testing Platform TestPG Architecture and Practice

A full‑link test can be summarized in three steps: pre‑test traffic processing (corpus production) , pressure model activation during the test , and post‑test result analysis and issue定位 . The pre‑test traffic processing is the most time‑consuming part. Previously, operations collected logs and handed them to testers to script, which was costly, slow, and prone to request expiration issues. TestPG standardized corpus format and traffic processing flow, but two main problems remained:

Lack of unified control over the corpus production process. Although the format is standardized, each business line handled traffic independently, leading to high production cost.

Interface‑level precise throttling cannot meet needs. Traffic varies with weather, geography, holidays, etc. For example, long‑distance navigation during holidays requires non‑linear backend compute resources, but the platform only supports interface‑level throttling, not feature‑level control.

To address these issues, Gaode’s full‑link testing team launched a corpus intelligence project.

Solution Approach

Standardized Traffic Capture

Gaode’s full‑link testing had already integrated most business lines, but traffic sources (logs, ODPS, raw traffic) were heterogeneous. The team decided to unify traffic capture by using a low‑cost, high‑efficiency method: traffic recording. Existing recording solutions were fragmented, causing instability. Therefore, a unified traffic capture platform (Iflow) was built.

Platform‑Based Corpus Production

After standardizing traffic capture, the next step is to convert raw traffic into platform‑compliant corpus. Because each business has unique requirements, a fully custom solution would be costly. The team chose Apache Flink to implement a flexible pipeline: business‑specific logic is expressed via user‑defined functions (UDFs), while common processing (e.g., feature extraction, storage) is handled by Flink sink plugins.

Corpus Intelligence

Gaode’s traffic exhibits strong seasonal patterns (e.g., longer navigation routes during holidays). To achieve feature‑level precise throttling, the platform must provide realistic traffic feature distributions. Corpus intelligence follows three stages:

1. Traffic Feature Statistics : Identify parameters that affect traffic (weather, holidays, etc.) and collect their distributions.

2. Traffic Feature Extraction : Extract these parameters during the corpus production stage, leveraging the unified Flink pipeline.

3. Intelligent Prediction & Machine Learning : Use historical peak‑traffic data (e.g., past National Day, Spring Festival) combined with current trends to predict feature distributions for upcoming peaks. Future work includes automated discovery of influencing parameters via machine learning.

Overall Architecture

The capture platform (Iflow) records traffic, caches it in Kafka, and writes it to ODPS. The corpus production service reads from ODPS, processes traffic with Flink, and stores the resulting corpus in OSS.

Core Components

Iflow Capture Platform

Iflow manages traffic capture tasks. Traffic is copied via plugins, cached in Kafka, and persisted to ODPS. Tasks require approval, reducing the risk of accidental incidents.

TestPG Corpus Intelligence

The system consists of three modules: business line management, test list management, and interface ratio management. Each business line represents a testing chain from capture to corpus production and feature analysis.

Key functions include:

Linking capture tasks to corpus production.

Automatic triggering of corpus production after capture completes.

Generating corpus paths for each run.

Managing HTTP headers for test requests.

Manual or automatic triggering of corpus production.

Test list management automatically registers newly discovered interfaces during capture, categorizing them as testable, exempt, or pending improvement, thus driving full‑link coverage.

Interface ratio management initially uses BI‑provided ratios; later, extracted traffic features will feed intelligent predictions to produce realistic ratio data for testing.

Platform Advantages

Corpus Platformization

The production pipeline integrates the capture platform and Flink, supporting both custom business logic (via UDFs) and common processing (via Flink sink). This greatly improves efficiency and quality of corpus generation, moving from format standardization to end‑to‑end process standardization.

Corpus Intelligence

Flink plugins collect and aggregate feature parameters during production. The accumulated statistics enable downstream intelligent analysis and prediction, achieving feature‑level precise throttling for truly realistic full‑link testing.

Future Outlook

Future work will apply machine‑learning models to historical feature data to predict traffic characteristics for upcoming peak periods, automatically discover influencing parameters, and provide confidence‑evaluation systems that compare predicted vs. real traffic features, continuously improving accuracy. Combined with precise throttling, pressure models, and monitoring, the platform aims for fully autonomous, intelligent full‑link testing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Flink Platform Engineering load testing full-link testing Traffic Capture

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.