How Huolala Built a Scalable Big Data Testing Platform to Cut Cycle Time by 70%
Huolala’s data testing platform tackles massive data volume, complexity, and quality challenges by automating test case generation, execution, monitoring, and alerting across multiple storage systems, dramatically reducing testing cycles from five days to 1.5 days and saving over 800 person-days.
Background and Challenges
With Huolala’s rapid business growth, big data is increasingly used for user behavior analysis, ad targeting, risk control, and decision support. Daily interactions reach hundreds of billions, demanding high quality, efficiency, availability, and proactivity in data testing.
Skill Requirements
Massive data testing : Proficiency in SQL across various databases to process large volumes.
Data‑warehouse layer testing : Understanding of multi‑layer data structures and business logic.
Flexible acceptance criteria : Ability to adapt standards to business needs.
Data insight : Combining testing with business analysis.
Data Challenges
Difficulty discovering issues : Huge data volume and mixed structured/unstructured formats make manual inspection infeasible.
Quality assurance difficulty : Ensuring accuracy, completeness, consistency, and timeliness requires comprehensive technical and managerial measures.
Efficiency Challenges
Test case management : Frequent data changes require repeated test case updates without a supporting platform.
Low regression efficiency : Manual regression is time‑consuming and cannot keep up with expanding scope.
Solution and Goals
Simplified testing process : Configurable templates and automatic script generation reduce manual effort.
Data test automation : One‑click conversion of test cases to automated scripts.
Quality monitoring and alerts : Scheduled online inspections and proactive alerts.
A dedicated big‑data testing platform automates each step, allowing users with no SQL expertise to operate it, while also providing test case persistence and a closed loop from generation to execution, analysis, monitoring, and automation.
Capability Building
Platform Architecture
The platform, built on Spring Boot, uses a hybrid engine for data computation and offers data‑quality model construction, execution, task management, anomaly detection, and alerting, with resource and permission isolation for security. It supports high concurrency, performance, and availability.
Architecture layers:
Application layer : Templates, case management, protocol management, scheduled tasks.
Service layer : Rule calculation, data rule definition, result analysis.
Data layer : Source data and metadata from data warehouses and business systems.
Storage layer : Supports MySQL, Hive, Doris, Phoenix, HBase, etc.
Core Capabilities
Test Management
Two modules: case‑template management (configurable templates for empty‑value checks, numeric checks, etc.) and case‑management (auto‑generation, persistence, and conversion to automated cases).
Template example “field empty‑rate”:
select<br/>[count(case when {{emptyFields}} is null or {{emptyFields}} = '' then 1 end)]/count(1) as {{emptyFields}}_rate<br/>from {{tableName}}<br/>where 1=1 and [{{partField}} = date_sub(current_date(),1) $and;partField$];Core code for converting a template to a rule object:
// Parse template into rule configuration class<br/>MonitorRuleTemple monitorRuleTemple = covertFromModule(config, MonitorAgreement.class);<br/><br/>// Retrieve SQL from rule configuration<br/>String sql = monitorRuleTemple.getSql();<br/><br/>if (Objects.nonNull(sql)) {<br/> sql = sqlProcess(sql, paramsMap);<br/> monitorRuleTemple.setSql(sql);<br/>}<br/><br/>// Replace {{param}} placeholders<br/>ruleProcess(monitorRuleTemple, paramsMap);<br/><br/>return monitorRuleTemple;Rule Execution
Combines rule configuration and computation, supporting multiple data sources (MySQL, Hive, Doris, Phoenix, HBase, Curl) via a hybrid engine.
// Determine client based on data source type<br/>DataSourceType sourceType = DataSourceType.findByTypeByte(dataSourceType);<br/>switch (sourceType) {<br/> case INTERFACE_CURL: queryTask = curlClient; break;<br/> case INTERFACE_HTTP: queryTask = httpClient; break;<br/> case HIVE: queryTask = idpClient; break;<br/> case MYSQL: queryTask = mySqlClient; break;<br/> case HBASE: queryTask = hbaseClient; break;<br/> case PHOENIX: queryTask = phoenixClient; break;<br/> default: queryTask = null;<br/>}<br/>return queryTask;Result Analysis
Supports preset expectations and various comparison types (>, >=, <, <=, =, !=, same‑period, range). Handles raw and derived metrics, applying expressions before rule evaluation.
// Example: range comparison implementation<br/>RuleCheckTypeStrategy.checkNotSupportNonNumeric(ruleResultContext.getActualResult(), this);<br/>Double expectRange = Double.valueOf(ruleResultContext.getExpectRange());<br/>Double actualRangeBoundary = RuleCheckTypeStrategy.computeThresholds(ruleResultContext.getActualResult(), this);<br/>boolean flag = true;<br/>flag = compareRange(expectRange, actualRangeBoundary);<br/>return flag;Monitoring and Alerts
Supports cron‑based scheduling and configurable alert notifications (phone, SMS, Feishu) for failures or abnormal results.
Platform Practice and Results
Practice Stages
Functional Testing
Configure data templates to auto‑generate test cases.
Smart analysis and data probing for downstream table validation.
Automatic result judgment and case persistence for regression.
Automated Regression
One‑click conversion of reusable cases to automated scripts.
Parallel execution reduces test time by ~70%.
Online Monitoring
Continuous quality monitoring for real‑time and offline tasks, covering data‑metric, horizontal, vertical, business‑logic, and data‑application‑chain monitoring.
Outcomes
Full‑process efficiency: testing cycle reduced from 5 days to 1.5 days; extra effort per requirement dropped from 22 h to 3.58 h.
Supported 200+ requirements, 1,000+ executions, saving over 800 person‑days.
Discovered 300+ issues; monitoring alerts detected 1,000+ anomalies, enabling rapid response.
3000+ automated/monitoring cases covering 100% of core tables and data‑application links.
Future Outlook
Intelligent modeling : Adaptive test‑scenario modeling for orders, users, drivers, risk, marketing, billing, etc.
Intelligent diagnosis : Large‑model predictions of impact and risk for precise testing.
Intelligent testing : Model‑driven automatic case generation, automation, and online monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
