Backend Development 13 min read

Traffic‑Based Quality Assurance Framework for Elasticsearch Search Service

This article presents a traffic‑driven quality assurance framework for an Elasticsearch‑based search service, detailing active and inactive code protection strategies, automated scenario generation from Dubbo logs, template fingerprinting, de‑duplication, expected result pools, and validation rules to ensure comprehensive test coverage and reliable regression.

政采云技术

Aug 17, 2023

Traffic‑Based Quality Assurance Framework for Elasticsearch Search Service

Background : The GovProcurement Cloud search service is built on Elasticsearch and provides structured and unstructured multi‑condition search for PC, app, and mini‑program clients.

Because the service is delivered as middleware, it cannot directly perceive how external applications use it, leading to manual scenario identification and incomplete interface test coverage. The main challenges are:

171 searchable conditions combine into many code branches, making regression impact uncertain.

Difficulty identifying uncovered scenarios.

Need to automatically transform uncovered scenarios into test cases.

Need to automate execution of those test cases.

To address these issues, a traffic‑based quality assurance scheme was implemented.

Overall Assurance Scheme : Distinguish active and inactive code based on traffic. For active code, use scenario calculation to automatically identify and construct test cases; for inactive code, apply manual coverage to supplement missing cases. The scheme also includes an automated regression pipeline.

Active Code Protection

Strategy : Collect Dubbo call logs, clean them into scenario data, compute scenarios, and generate test cases.

The process consists of three nodes:

Log collection node – external applications invoke the search via Dubbo, generating logs containing request parameters.

Data cleaning node – parses logs into structured scenario data.

Scenario calculation node – the core component, performing three functions: parameter templating, template fingerprint generation, and traffic deduplication.

Parameter Templating : Two methods are used.

1. Field‑based templating – replace field values with placeholders while keeping the JSON skeleton.

// Example (illustrative only)
{
  "bussinessScope": "浙江",
  "status": true,
  "keywords": "打印机"
}

// Generated template
{
  "bussinessScope": "@",
  "status": @,
  "keywords": "@"
}

2. Field‑and‑value templating – keep business‑meaningful values (e.g., attribute) and replace others with placeholders.

// Example (illustrative only)
{
  "attribute": "颜色:黑色",
  "status": true,
  "keywords": "数据线"
}

// Generated template
{
  "attribute": "颜色:黑色",
  "status": @,
  "keywords": "@"
}

Template Fingerprint : After generating a template, compute its MD5 hash to obtain a unique fingerprint. Identical fingerprints indicate duplicate queries, enabling fast deduplication of massive daily traffic.

// Example (illustrative only)
Template fingerprint: D8AD32393C65D62C8658A9D699A8C190

Deduplication : New traffic generates a new fingerprint; if it matches an existing one, the query is considered duplicate and skipped. This quickly discovers new scenarios while recognizing repeated traffic.

Result of traffic calculation: 1 353 generated test cases (618 P1, 516 P2, 58 P3, 161 P4).

Inactive Code Protection

Code not exercised by traffic (e.g., feature switches, exception paths, unused methods) is covered manually by creating test cases for each category.

Automated Regression

Test cases generated from both active and inactive code are executed automatically. Two key components are built:

Expected Result Pool : For a given query, the first execution records the online result as the expected result. Subsequent runs compare against this pool, ensuring consistent expectations despite data changes. The pool is a lightweight index (tens of thousands of records) that avoids slow full‑index scans.

Validation Rules : Different scenarios have tailored validation rules. Interface tests focus on total count and field accuracy; refactor tests require exact result set and order matching. In total, 37 validation rules were defined (36 for interface tests across P1‑P4 levels, 1 for refactor tests).

Project Implementation

In 2023, the search code was refactored, requiring full scenario coverage. Over two rounds of refactoring, the following were achieved:

Generated 4 128 pre‑release, 6 322 protocol, and 4 174 full‑service test cases automatically.

Discovered 7 bugs via automation; one highlighted a missed test due to result order changes caused by a field type change (long → keyword) in the index.

Future Plans

Deploy traffic recording and replay in offline environments.

Combine mock‑based replay with real‑call approaches.

Explore code‑coverage‑driven scenario generation.

Recruitment Notice

The GovProcurement Cloud technical team (Zero) in Hangzhou is seeking passionate engineers to join a 500‑person group working on cloud‑native, blockchain, AI, low‑code platforms, middleware, big data, and more. Interested candidates can contact zcy‑tc@cai‑inc.com.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch automated testing quality assurance Search

Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.