Operations 15 min read

How We Built a Configurable Online Test Monitoring System for Real‑Time CI/CD Alerts

This article details the design, evolution, and implementation of an online test‑monitoring platform that transforms CI/CD pipelines into proactive alerting systems, covering the initial Spring‑based prototype, its shortcomings, the 2.0 configurable and visual redesign, plugin architecture, and future distributed deployment plans.

Youzan Coder

Sep 7, 2018

How We Built a Configurable Online Test Monitoring System for Real‑Time CI/CD Alerts

Background and Motivation

Continuous Integration (CI) and Continuous Delivery (CD) are core practices in modern software development, but they stop short of detecting issues once code reaches production. To fill this gap, the team built an online test‑monitoring system that proactively warns of production problems, quickly discovers bugs under low traffic, and clarifies the impact scope during emergencies.

1.0 Prototype Architecture

The first version (1.0) was a simple Spring Web application, referred to internally as the "online robot check". It consisted of three modules:

Task scheduling – wraps test cases as Quartz jobs and triggers them via an API after each release.

Test case module – defines business access, assertions, and alerts; requires test engineers to write code for each business line.

Alert module – integrates with the internal alerting platform.

System diagrams (omitted) showed the flow of basic and scenario test cases, with execution strategies configurable per case.

Problems with the Basic Version

Increasing business lines raised test‑case development cost.

Maintenance overhead grew with case count.

Every code change required a new deployment.

Running status and business coverage were not visible.

All cases executed together, regardless of relevance.

Redundant code reduced efficiency.

2.0 Redesign: Configurability and Visualization

The second version addressed the above issues by introducing:

Configuration‑driven test cases and scenarios managed through a web console.

Standardized case structure and assertion policies.

Real‑time effect of case changes without redeployment.

Front‑end dashboards displaying execution health and coverage.

Integration with the release platform to select which cases run per application.

Reusable core execution framework.

New architecture diagrams (omitted) illustrate the separation of execution flow from data flow, allowing test case design without coding and storing cases in a database.

Test Case Model

Each test case is defined by fields such as:

Case Name (required) – format: "type:service:method".

Case Type (required) – either http or dubbo.

Description (required) – scenario description.

Business (required) – business line identifier.

Request URL (optional) – for HTTP calls.

Headers (optional) – HTTP headers.

Parameters (optional) – request payload, supports dynamic injection for inter‑case dependencies.

Service Name (optional) – Dubbo interface name.

Method (required) – HTTP method or Dubbo method name.

Assertions (required) – one or more validation rules.

Various switches (enabled, login, retry, pre/post checks) to control execution.

A note indicates that some internal fields are omitted for brevity.

Visualization

The front‑end now provides charts that show real‑time health metrics and business coverage for each test case, making it easier for developers, testers, and operations staff to monitor the system.

Inter‑Case Dependency Implementation

Two mechanisms enable case dependencies:

Configure pre‑ and post‑execution relationships in the management console.

Inject parameters using a special placeholder syntax $#a,b,c#$ , where a is the dependent case ID, b is the response field key, and c (optional) is the array index when the value resides in a list.

Example injection payload: {"code":"$#8,data,0#$","type":"$#10,type#$"} The injection workflow parses the response, creates a Groovy binding, and evaluates the expression using GroovyShell. The detailed Java implementation is shown in the ... block, handling JSON arrays, objects, and error reporting.

Assertion Module Design

Four generic assertion types were created:

Containment – true if the response contains the expected string.

Non‑null – true if the response is not null/empty.

JSON value equality – compares a value at a specific JSON path with an expected value.

Pseudo‑code – arbitrary Groovy expressions evaluated against the parsed JSON, useful for complex checks such as list size.

Example pseudo‑code for checking a list size: getJSONObject("data").getJSONObject("list").size()>0 The assertion engine wraps the expression, executes it, and sets the result status and message accordingly.

Plugin Architecture (3.0)

To support more complex scenarios, a plugin mechanism was added:

Provide a standard test‑case interface ( AbstractTestCase) as a JAR that external developers implement.

Decouple test cases from the platform; cases can be configured without code changes.

Enable hot‑plugging of new cases without restarting the platform.

The interface definition:

public interface AbstractTestCase {
    CaseResult before();
    CaseResult run();
    void after();
}

After packaging the implementation into a JAR, the platform dynamically loads it, discovers classes implementing the interface, invokes the lifecycle methods according to the configured strategy, and reports the results. The class‑discovery logic uses reflection to scan all classes in the JAR and filter those assignable to AbstractTestCase.

Future Directions

Planned enhancements include more flexible execution strategies, higher core‑case frequency, customizable alert policies per business line, multi‑data‑center deployment, and a distributed architecture to eliminate single points of failure.

Overall, the system now gives test engineers immediate visibility into production health, and future improvements will broaden its coverage and reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ci/cd Operations plugin architecture spring test automation online monitoring

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.