How We Cut Risk‑Control Test Regression from Hours to Minutes with a Global Feature Graph
This article details how a risk‑control team built a global feature relationship graph, precise testing platform, and full‑domain interception to dramatically reduce regression testing time, improve data quality, and boost overall testing efficiency across the organization.
Background and Challenges
As Huolala’s business expands rapidly, identifying and avoiding risks becomes critical. The risk‑control system monitors billions of requests daily, relying on feature data sourced from upstream services and enriched through cleaning and completion. Ensuring high‑quality features is essential to prevent mis‑judgments.
Figure: Source of features and risk‑control dependency
Note ①: Risk‑control strategy judgment assembles feature fields, matches conditions, and decides subsequent actions. Note ②: Features are auxiliary information for strategy conditions, originating from request fields or derived through cleaning and completion. Note ③: Completion enriches existing fields via service calls, code functions, statistics, or condition matching.
1.1 Feature Quality Assurance Focus
Risk‑control demands are classified into three types, each with distinct testing emphasis.
1. Risk‑control code changes
When code changes, feature quality assurance focuses on the impact range of feature modifications and the correctness of feature logic processing.
2. No code change, configuration change
When only configuration changes, assurance centers on impact range and data‑fetch accuracy.
3. No code or config change, new business integration
Here, assurance primarily ensures accurate data fetching for the new business.
2.1 Feature Quality Assurance Difficulties
Three main challenges remain despite clear focus areas:
Identifying impact range lacks testing handle
Complex dependencies between strategies, features, and completions make visualisation difficult, leading to time‑consuming and error‑prone impact analysis.
Feature changes trigger extensive regression work with low efficiency.
Validating feature logic correctness has low efficiency
Features depend on various external data sources, requiring extra learning to construct valid test data.
Risk‑control sits downstream; testing requires upstream triggers.
Numerous feature configurations demand re‑learning for regression testing.
Ensuring feature data collection accuracy is passive
The system does not validate upstream data quality, and core production strategies are not configured in test environments, causing data issues to surface only after release.
Downstream placement and many upstream partners make timely detection of field errors difficult.
Solution and Goals
Figure: Risk‑control feature quality assurance system
We adopted three solutions:
Identify impact range: map data dependencies, visualise them, and improve testing efficiency.
Validate logic correctness: standardise feature data construction to lower data‑creation cost.
Ensure data collection accuracy: intercept non‑compliant upstream data, turning passive problems into proactive detection.
To achieve these goals, we built a Feature Test Platform that integrates real‑time risk, data factories, mock services, and requirement data, providing precise, scenario, and full‑domain testing capabilities, thereby improving quality and efficiency throughout the development lifecycle.
3. Risk‑Control Specific Precise Testing
Risk‑control decisions depend heavily on feature fields, which may reference other features or completions. When features are added or changed, we must assess the impact on all dependent data. Since the risk‑control system lacks direct visualisation of these dependencies, manual statistics or complex SQL are usually required.
3.1 Global Relationship Graph
We built a global relationship graph by parsing all strategies, features, and completions via API calls, establishing bidirectional references stored in a database.
Figure: One‑way parsing and two‑way referencing
3.1.1 Double‑Sided Reference Tree Rendering
We render a three‑part tree on the frontend: the centre node (searched feature), the left subtree (incoming references), and the right subtree (outgoing references). Using the relation-graph library (compatible with Vue2, Vue3, React), we create nodes and lines arrays, assign hierarchy intervals, and render the graph.
/**
* Append node
* @param id node id
* @param hierarchy node hierarchy
* @param nodeType node type dataCompletion,feature,strategy
* @param relateType query level type 0 child,1 parent
*/
let nextHierarchy = hierarchy + 1;
let preHierarchy = hierarchy - 1;
let currentHierarchy = relateType === 0 ? nextHierarchy : preHierarchy;
let newJsonData = { nodes: [], lines: [] };
let node = { id: 'feature-' + currentHierarchy + item.id, data: { hierarchy: currentHierarchy } };
let line = { from: relateType === 0 ? id : 'feature-' + currentHierarchy + item.id, to: relateType === 0 ? 'feature-' + currentHierarchy + item.id : id, isReverse: relateType !== 0 };
newJsonData.nodes.push(node);
newJsonData.lines.push(line);
this.$refs.seeksRelationGraph.appendJsonData(newJsonData, (seeksRGGraph) => {});3.1.2 Retrieve Relationship Graph
After building the global graph, any strategy, feature, or completion can be queried, displaying its related nodes and allowing rapid impact identification. This reduces regression case analysis from 4 hours to 5 minutes.
Figure: Retrieved relationship graph
3.2 Precise Testing
Even with the global graph, change detection remains challenging. Unexpected configuration changes or unnoticed code modifications can escape testing. We therefore combine data‑change monitoring and code‑change monitoring to achieve precise testing.
3.2.1 Data Change Monitoring
We poll the risk‑control decision system’s data interface, compare updated_at timestamps, and determine whether a record is new, unchanged, or modified.
/**
* Determine if data has changed based on Zeuss API model.
*/
Integer zeusId = zeusApiModel.getZeusId();
ZeusApiModel findZeusApi = findByZeusId(zeusId);
if (findZeusApi.getId() != null) {
if (zeusApiModel.getZeusUpdate().equals(findZeusApi.getZeusUpdate())) {
return true; // no change
}
// update existing record
zeusApiModel.setId(findZeusApi.getId());
return false;
} else {
// insert new record
addZeusApi(zeusApiModel);
return false;
}3.2.2 Code Change Monitoring
We leverage Huolala’s existing Jacoco‑based incremental code scan platform. It provides class and method lists for changed code, which we map to corresponding completions.
public List<String> getClassPathByUrl(String url) {
List<String> classList = new ArrayList<>();
String html = sendGet(url);
Document doc = Jsoup.parse(html);
String title = "";
String classTitle = "";
if (doc.title().contains("release")) {
title = doc.title();
} else {
classTitle = doc.title();
}
Elements classElementsList = doc.select("a.el_class");
for (Element classNode : classElementsList) {
classList.add(classTitle + "." + classNode.text());
}
Elements packageElementsList = doc.select("a.el_package");
for (Element packageNode : packageElementsList) {
String urlNext = url + "/" + packageNode.text() + "/index.html";
classList.addAll(getClassPathByUrl(urlNext));
}
return classList;
}3.2.3 Traversing Referenced Tree
We perform a breadth‑first search on the left (referenced) subtree to collect all features or completions affected by a change.
Breadth‑first search visits each node once, level by level.
LinkedHashMap<String, RiskNode> linkedHashMap = new LinkedHashMap<>();
Queue<RiskNode> queue = new LinkedList<>();
queue.add(startNode);
while (!queue.isEmpty()) {
RiskNode node = queue.poll();
linkedHashMap.put(node.getName(), node);
List<RiskNode> nextNodes = getNextNodes(lines, node);
if (!nextNodes.isEmpty()) {
for (RiskNode nextNode : nextNodes) {
queue.add(nextNode);
}
}
}The traversal reduces impact analysis time from 4 hours to 10 minutes.
4. Scenario‑Based Testing
Risk‑control testing is divided into three stages: risk‑control testing, integration testing, and regression testing. We built direct testing, scenario‑based testing, historical replay, and requirement scenario libraries to streamline the process.
4.1 Direct Risk‑Control Testing
4.1.1 Quick Test
We fetch risk‑control configuration, parse input parameters, and invoke the risk engine directly without upstream traffic, adding automatic assertions.
Figure: Quick test
Test result:
Figure: Quick test page
4.1.2 Completion Mock
We mock completion calls using Huolala’s Java mock platform (JVM‑sandbox bytecode enhancement) to bypass external data sources.
Figure: Mock configuration
4.2 Scenario‑Based Testing
We employ component‑based, keyword‑driven, and data‑driven automation to construct reusable test actions.
Component‑based automation
Keyword‑driven automation
Data‑driven automation
Components from the data factory are wrapped as keywords, enabling rapid scenario composition.
Figure: Keyword configuration
4.2.1 Components and Keywords
External system functionalities and risk‑control actions are encapsulated as components and then exposed as keywords for scenario building.
Figure: Keyword configuration
4.2.2 Scenario Orchestration
We add expected steps (keywords), adjust order, and provide input parameters for each step.
Figure: Scenario orchestration
4.2.3 Scenario Execution
During execution, scenario inputs become global variables; each step reads and updates these variables, achieving data‑driven flow.
Figure: Scenario execution
4.3 Scenario Replay
4.3.1 Historical Scenario Replay
Failed cases are recorded with their inputs and steps; replaying them validates bug fixes.
4.3.2 Requirement Scenario Library
Requirement‑level configurations (strategies, features, completions) are stored, enabling impact queries and reuse across regressions.
Figure: Requirement scenario library
5. Global Interception
Risk‑control accuracy depends on upstream field completeness. To proactively catch non‑compliant data, we introduced a full‑domain interception mechanism in the pre‑environment.
5.1 Global Interception Solution
All changes first pass through a pre‑environment where a full‑traffic control policy validates fields against compliance rules, returning intercept codes and notifying testers.
Figure: Global interception solution
5.2 Global Risk‑Control Strategy
Compliance conditions are derived from interface docs (e.g., user_type == 2 && ep_id is empty) and combined into interception policies that block non‑compliant requests.
Figure: Strategy rollout process
5.3 Issue Tracking
Intercepted flows trigger Feishu notifications; testers create follow‑up tickets to investigate and resolve data quality defects.
Figure: Interception follow‑up ticket
5.4 Intercept Bug Tracking
Long‑standing issues are catalogued and periodically pushed to product owners for resolution.
Figure: Pending bug list
6. Results and Benefits
6.1 End‑to‑End Efficiency Gains
Risk‑control scenario testing: 100+ data‑bound scenarios, 2000+ executions, delivering 600+ hours of efficiency.
Full‑process improvement: total effort reduced from 26 hours to 2.58 hours .
6.2 Data Quality Improvement
Global interception: 60+ policies blocked >70 k abnormal flows, generated 80+ tickets, uncovered 40+ data defects.
Product documentation standards: identified 20+ doc issues, improving upstream data contracts.
Online data quality: mitigated mis‑judgment risk for 8.71% of traffic .
7. Future Outlook
We plan to further enhance risk‑control testing with AI‑driven capabilities.
Intelligence: apply deep learning to auto‑label abnormal traffic, reducing manual effort.
Smart: auto‑generate functional and automation test cases from the strategy graph.
Intelligent Testing: bind cases to scenarios, combine traffic replay for AI‑assisted validation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
