How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse
This article details practical methods for testing BI reports built on Hive and StarRocks, covering the report creation workflow, testing characteristics, SQL writing techniques, impact analysis, data warehouse simplification, and the application of data quality tools to ensure accurate and efficient reporting.
1. Introduction
As business grows and merchants demand more data visualization, BI reporting has become a widely used visualization tool. Because BI reports draw data from multiple internal business teams and external sources, the complex data processing makes ensuring data accuracy and testing efficiency challenging.
2. Abstract
Testing BI reports differs from business testing. This article shares practices from Weimeng's BI report testing, outlining methods for offline Hive reports, including the BI report creation process, testing characteristics, test‑SQL writing, impact‑scope analysis, and data‑quality tool usage.
3. BI Report Creation Process
BI development syncs required MySQL tables to Hive, cleans the data, stores results in StarRocks, and uses BiStudio to link StarRocks tables to create datasets for report controls. Product teams then drag‑and‑drop to develop report pages, and data operators publish versions in the WOS system.
4. Characteristics of BI Testing
4.1 Test Objects
Report pages have few complex logic; they mainly display dimensional metrics, so testing focuses on data verification.
4.2 Test Methods
Compared with business testing, BI testing combines business understanding with product requirement analysis and verifies accuracy through test SQL.
5. Offline Report Testing Methods
5.1 Requirement Analysis
Based on product requirements and metric definitions, the report needs are abstracted, identifying common dimensions such as date granularity, node types, all nodes, current node, and sub‑node aggregation.
5.2 Test SQL Writing
1. Query scope: query data of the same granularity (e.g., order‑level and order‑item metrics together). Complex SQL may affect execution speed. 2. Query metrics: SELECT aggregates should include as many metrics as possible to reduce query count. 3. Query dimensions: include many dimensions to cover more scenarios in a single query.
<code>-- Aggregate order data, calculate total receivable, actual, and payment count
SELECT
o.source_channel,
o.access_channel,
SUM(o.receivable_amount) AS total_receivable,
SUM(o.actual_amount) AS total_actual,
SUM(od.payment_count) AS total_payment_count
FROM orders o
JOIN (
SELECT order_id, bos_id, SUM(payment_count) AS payment_count
FROM order_details
GROUP BY order_id, bos_id
) od ON o.order_id = od.order_id AND o.bos_id = od.bos_id
WHERE o.dd = "${dd}"
AND o.bos_id = "${bos_id}"
AND o.vid = ${vid}
GROUP BY o.source_channel, o.access_channel;</code>5.3 Impact Scope Analysis
Iterative reports are examined for upstream/downstream impact using data‑platform functions.
Use checkout in offline tasks to compare the current SQL with a specified version.
Analyze downstream impact based on task output table details for regression sampling.
5.4 Reducing Complexity Within the Warehouse
Warehouse model tables provide common dimensions and rarely change. Tested DWD/DWS models avoid complex joins in ODS, improving test SQL efficiency. Example converting the previous query to use DWD:
<code>SELECT
source_channel,
access_channel,
SUM(receivable_amount) AS total_receivable,
SUM(actual_amount) AS total_actual,
SUM(payment_count) AS total_payment_count
FROM dwd.order_info
WHERE dd = "${dd}"
AND bos_id = "${bos_id}"
AND vid = ${vid}
GROUP BY source_channel, access_channel;</code>5.5 Functionality Outside the Warehouse
Report configuration tasks are critical; common functions such as table sorting, download, chart export, node filter correctness, hover data accuracy, and metric grouping should be verified without extensive functional testing.
6. Data Quality Tool Usage
Report logic is usually executed in the warehouse; the tool focuses on table quality and result accuracy. Quality rules linked to scheduled tasks trigger daily checks of output tables.
6.1 Applicable Scenarios
Configure table data quality rules to detect business report or task anomalies.
Custom rules enable automated regression of report metrics on each task schedule.
6.2 Evolution of Test SQL
6.2.1 Data Scope
During requirement iterations, test SQL shifts from store‑level queries to daily partition queries, covering more data and scenarios to catch production issues.
6.2.2 Custom Comparison SQL
Custom SQL aggregates results; the expected and actual results are compared by unioning two queries and aggregating row counts—if the count differs from 1, the data is inconsistent.
Weimob Technology Center
Official platform of the Weimob Technology Center
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.