Big Data 19 min read

How Huolala Built a Scalable Big Data Testing Platform to Cut Cycle Time by 70%

Huolala’s data testing platform tackles massive data volume, complexity, and quality challenges by automating test case generation, execution, monitoring, and alerting across multiple storage systems, dramatically reducing testing cycles from five days to 1.5 days and saving over 800 person-days.

Huolala Tech

Jun 25, 2024

How Huolala Built a Scalable Big Data Testing Platform to Cut Cycle Time by 70%

Background and Challenges

With Huolala’s rapid business growth, big data is increasingly used for user behavior analysis, ad targeting, risk control, and decision support. Daily interactions reach hundreds of billions, demanding high quality, efficiency, availability, and proactivity in data testing.

Skill Requirements

Massive data testing : Proficiency in SQL across various databases to process large volumes.

Data‑warehouse layer testing : Understanding of multi‑layer data structures and business logic.

Flexible acceptance criteria : Ability to adapt standards to business needs.

Data insight : Combining testing with business analysis.

Data Challenges

Difficulty discovering issues : Huge data volume and mixed structured/unstructured formats make manual inspection infeasible.

Quality assurance difficulty : Ensuring accuracy, completeness, consistency, and timeliness requires comprehensive technical and managerial measures.

Efficiency Challenges

Test case management : Frequent data changes require repeated test case updates without a supporting platform.

Low regression efficiency : Manual regression is time‑consuming and cannot keep up with expanding scope.

Solution and Goals

Simplified testing process : Configurable templates and automatic script generation reduce manual effort.

Data test automation : One‑click conversion of test cases to automated scripts.

Quality monitoring and alerts : Scheduled online inspections and proactive alerts.

A dedicated big‑data testing platform automates each step, allowing users with no SQL expertise to operate it, while also providing test case persistence and a closed loop from generation to execution, analysis, monitoring, and automation.

Capability Building

Platform Architecture

The platform, built on Spring Boot, uses a hybrid engine for data computation and offers data‑quality model construction, execution, task management, anomaly detection, and alerting, with resource and permission isolation for security. It supports high concurrency, performance, and availability.

Architecture layers:

Application layer : Templates, case management, protocol management, scheduled tasks.

Service layer : Rule calculation, data rule definition, result analysis.

Data layer : Source data and metadata from data warehouses and business systems.

Storage layer : Supports MySQL, Hive, Doris, Phoenix, HBase, etc.

Core Capabilities

Test Management

Two modules: case‑template management (configurable templates for empty‑value checks, numeric checks, etc.) and case‑management (auto‑generation, persistence, and conversion to automated cases).

Template example “field empty‑rate”:

select<br/>[count(case when {{emptyFields}} is null or {{emptyFields}} = '' then 1 end)]/count(1) as {{emptyFields}}_rate<br/>from {{tableName}}<br/>where 1=1 and [{{partField}} = date_sub(current_date(),1) $and;partField$];

Core code for converting a template to a rule object:

// Parse template into rule configuration class<br/>MonitorRuleTemple monitorRuleTemple = covertFromModule(config, MonitorAgreement.class);<br/><br/>// Retrieve SQL from rule configuration<br/>String sql = monitorRuleTemple.getSql();<br/><br/>if (Objects.nonNull(sql)) {<br/>    sql = sqlProcess(sql, paramsMap);<br/>    monitorRuleTemple.setSql(sql);<br/>}<br/><br/>// Replace {{param}} placeholders<br/>ruleProcess(monitorRuleTemple, paramsMap);<br/><br/>return monitorRuleTemple;

Rule Execution

Combines rule configuration and computation, supporting multiple data sources (MySQL, Hive, Doris, Phoenix, HBase, Curl) via a hybrid engine.

// Determine client based on data source type<br/>DataSourceType sourceType = DataSourceType.findByTypeByte(dataSourceType);<br/>switch (sourceType) {<br/>    case INTERFACE_CURL: queryTask = curlClient; break;<br/>    case INTERFACE_HTTP: queryTask = httpClient; break;<br/>    case HIVE: queryTask = idpClient; break;<br/>    case MYSQL: queryTask = mySqlClient; break;<br/>    case HBASE: queryTask = hbaseClient; break;<br/>    case PHOENIX: queryTask = phoenixClient; break;<br/>    default: queryTask = null;<br/>}<br/>return queryTask;

Result Analysis

Supports preset expectations and various comparison types (>, >=, <, <=, =, !=, same‑period, range). Handles raw and derived metrics, applying expressions before rule evaluation.

// Example: range comparison implementation<br/>RuleCheckTypeStrategy.checkNotSupportNonNumeric(ruleResultContext.getActualResult(), this);<br/>Double expectRange = Double.valueOf(ruleResultContext.getExpectRange());<br/>Double actualRangeBoundary = RuleCheckTypeStrategy.computeThresholds(ruleResultContext.getActualResult(), this);<br/>boolean flag = true;<br/>flag = compareRange(expectRange, actualRangeBoundary);<br/>return flag;

Monitoring and Alerts

Supports cron‑based scheduling and configurable alert notifications (phone, SMS, Feishu) for failures or abnormal results.

Platform Practice and Results

Practice Stages

Functional Testing

Configure data templates to auto‑generate test cases.

Smart analysis and data probing for downstream table validation.

Automatic result judgment and case persistence for regression.

Automated Regression

One‑click conversion of reusable cases to automated scripts.

Parallel execution reduces test time by ~70%.

Online Monitoring

Continuous quality monitoring for real‑time and offline tasks, covering data‑metric, horizontal, vertical, business‑logic, and data‑application‑chain monitoring.

Outcomes

Full‑process efficiency: testing cycle reduced from 5 days to 1.5 days; extra effort per requirement dropped from 22 h to 3.58 h.

Supported 200+ requirements, 1,000+ executions, saving over 800 person‑days.

Discovered 300+ issues; monitoring alerts detected 1,000+ anomalies, enabling rapid response.

3000+ automated/monitoring cases covering 100% of core tables and data‑application links.

Future Outlook

Intelligent modeling : Adaptive test‑scenario modeling for orders, users, drivers, risk, marketing, billing, etc.

Intelligent diagnosis : Large‑model predictions of impact and risk for precise testing.

Intelligent testing : Model‑driven automatic case generation, automation, and online monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL quality assurance platform architecture Data Testing

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.