How We Engineered a Scalable Regression Testing Pipeline for a High‑Frequency C++ Search Engine

This article details the motivation, design, and implementation of a systematic regression testing framework for a high‑frequency C++ search engine, covering traffic recording, DIFF testing, one‑click load testing, pipeline integration, and future automation plans to match rapid iteration cycles.

DeWu Technology
DeWu Technology
DeWu Technology
How We Engineered a Scalable Regression Testing Pipeline for a High‑Frequency C++ Search Engine

Why Build This System?

In a large‑scale search system, the C++ engine serves as the core infrastructure—performance‑sensitive, complex, and frequently changed. Rapid product iteration exposed the need to upgrade regression capabilities to keep pace with development velocity.

Problems We Needed to Solve

Existing testing relied on manual DIFF and load tests, which suffered from low automation, noisy results, fragmented tooling, and limited reusability when multiple features were developed in parallel. The goal was to make regression a systematic, repeatable part of the release process.

Overall Solution Overview

The solution is split into five key directions:

Traffic recording as the foundation for all tests.

Standardized DIFF and load‑test environments.

Enhanced DIFF tooling that provides analyzable, attributable results.

One‑click load‑test capability to lower execution barriers.

Integration with the index platform and a unified reporting index.

Each direction is described in detail below.

1. Traffic Recording: The Regression Infrastructure

We first built a traffic‑recording pipeline because both DIFF and load tests require stable, reproducible traffic. Recording is triggered from the index platform, configured via the ARK configuration center, and consumed by the C++ engine in real time.

Recording configuration is centralized in dsearch3#test.properties, supporting a global switch, app/group selection, deadline, IP targeting, and sampling rate (0‑100%). This makes recording behavior controllable, recyclable, and finely managed.

Recorded traffic is streamed as Kafka messages, shared across business scenarios, and stored in ODPS with daily partitions. Each record includes request body, traffic scenario, experiment info, and environment (prod/pre‑release), providing a unified data source for downstream DIFF, load testing, and issue reproduction.

request_type: traffic tag (original C++ request type)
app_name: C++ engine appName
group_name: C++ engine groupName
request_body: recorded request body
env: traffic environment (pre‑release/production)
graph_name: graph name
experiments: list of experiments (new search features)
pt: ODPS partition (daily)

2. DIFF Testing: From “None” to “Attributable”

DIFF execution follows a unified entry in the index platform: select traffic, configure parameters, trigger DIFF, and view the report. The backend service handles traffic selection, transformation, request forwarding, noise reduction, and report generation.

Two comparison modes are supported:

Cluster‑level: separate full clusters for control (master branch) and experiment (pre‑release branch).

Line‑level: precise binding to specific search/rank IPs.

DIFF strategies are divided into:

Response DIFF : field‑by‑field response comparison and funnel operator comparison.

Metric DIFF : similarity distribution (with/without ranking), funnel operator consistency, field add/delete/modification counts, and custom metrics.

Noise reduction focuses on AA DIFF (unstable sorting), ignorable fields, minor numeric fluctuations, and timeout‑induced anomalies, ensuring that reported DIFFs correspond to real issues.

Reports are generated in two layers:

Summary report includes application, cluster, request interface, traffic tag, routing info, comparison counts, DIFF counts, overall consistency, average recall, and score statistics, plus similarity distribution and funnel consistency statistics.

Detail report lists traceId, consistency rate, field changes, request bodies, and detailed DIFF entries for both responses and funnel operators.

Reports are automatically posted to relevant chat groups with links for quick review.

3. Load Testing: One‑Click Performance Regression

The load‑test flow mirrors DIFF: the index platform initiates the test, selects traffic, fills parameters, triggers execution, and records results. The testing service creates test files, launches tasks on the load‑test platform, and collects reports—all without manual intervention.

Execution uses master branch as the control group and pre‑release branch as the experiment group, with feature switches toggled to compare performance curves under step‑wise load increase.

Load‑test reports are displayed on the platform and similarly notified to teams.

4. Release Pipeline and Gate Mechanism

The regression capabilities are integrated into the release pipeline, making DIFF and load testing mandatory gate checks. If regression fails, the build is blocked from deployment. Future plans include automatic scaling of regression resources, auto‑generation of gate reports, and full automation of the gate pipeline.

5. Future Plans and Summary

Upcoming work aims for 100% regression execution to eliminate missed tests, fully automated gate pipelines, broader coverage of search scenarios (traffic control, commercialization, international search), and a unified SOP for releases.

In summary, the regression capability upgrade is not just a tool update but an engineering governance effort that turns experience into process, voluntary actions into enforced constraints, and moves risk mitigation before production, ensuring every search engine upgrade is more controllable and trustworthy.

CI/CDload testingPipelineregression testingDIFF testingC++ search engine
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.