Big Data 9 min read

How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse

This article details practical methods for testing BI reports built on Hive and StarRocks, covering the report creation workflow, testing characteristics, SQL writing techniques, impact analysis, data warehouse simplification, and the application of data quality tools to ensure accurate and efficient reporting.

Weimob Technology Center

Jan 2, 2024

How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse

1. Introduction

As business grows and merchants demand more data visualization, BI reporting has become a widely used visualization tool. Because BI reports draw data from multiple internal business teams and external sources, the complex data processing makes ensuring data accuracy and testing efficiency challenging.

2. Abstract

Testing BI reports differs from business testing. This article shares practices from Weimeng's BI report testing, outlining methods for offline Hive reports, including the BI report creation process, testing characteristics, test‑SQL writing, impact‑scope analysis, and data‑quality tool usage.

3. BI Report Creation Process

BI development syncs required MySQL tables to Hive, cleans the data, stores results in StarRocks, and uses BiStudio to link StarRocks tables to create datasets for report controls. Product teams then drag‑and‑drop to develop report pages, and data operators publish versions in the WOS system.

4. Characteristics of BI Testing

4.1 Test Objects

Report pages have few complex logic; they mainly display dimensional metrics, so testing focuses on data verification.

4.2 Test Methods

Compared with business testing, BI testing combines business understanding with product requirement analysis and verifies accuracy through test SQL.

5. Offline Report Testing Methods

5.1 Requirement Analysis

Based on product requirements and metric definitions, the report needs are abstracted, identifying common dimensions such as date granularity, node types, all nodes, current node, and sub‑node aggregation.

5.2 Test SQL Writing

1. Query scope: query data of the same granularity (e.g., order‑level and order‑item metrics together). Complex SQL may affect execution speed. 2. Query metrics: SELECT aggregates should include as many metrics as possible to reduce query count. 3. Query dimensions: include many dimensions to cover more scenarios in a single query.

-- Aggregate order data, calculate total receivable, actual, and payment count
SELECT
    o.source_channel,
    o.access_channel,
    SUM(o.receivable_amount) AS total_receivable,
    SUM(o.actual_amount) AS total_actual,
    SUM(od.payment_count) AS total_payment_count
FROM orders o
JOIN (
    SELECT order_id, bos_id, SUM(payment_count) AS payment_count
    FROM order_details
    GROUP BY order_id, bos_id
) od ON o.order_id = od.order_id AND o.bos_id = od.bos_id
WHERE o.dd = "${dd}"
  AND o.bos_id = "${bos_id}"
  AND o.vid = ${vid}
GROUP BY o.source_channel, o.access_channel;

5.3 Impact Scope Analysis

Iterative reports are examined for upstream/downstream impact using data‑platform functions.

Use checkout in offline tasks to compare the current SQL with a specified version.

Analyze downstream impact based on task output table details for regression sampling.

5.4 Reducing Complexity Within the Warehouse

Warehouse model tables provide common dimensions and rarely change. Tested DWD/DWS models avoid complex joins in ODS, improving test SQL efficiency. Example converting the previous query to use DWD:

SELECT
    source_channel,
    access_channel,
    SUM(receivable_amount) AS total_receivable,
    SUM(actual_amount) AS total_actual,
    SUM(payment_count) AS total_payment_count
FROM dwd.order_info
WHERE dd = "${dd}"
  AND bos_id = "${bos_id}"
  AND vid = ${vid}
GROUP BY source_channel, access_channel;

5.5 Functionality Outside the Warehouse

Report configuration tasks are critical; common functions such as table sorting, download, chart export, node filter correctness, hover data accuracy, and metric grouping should be verified without extensive functional testing.

6. Data Quality Tool Usage

Report logic is usually executed in the warehouse; the tool focuses on table quality and result accuracy. Quality rules linked to scheduled tasks trigger daily checks of output tables.

6.1 Applicable Scenarios

Configure table data quality rules to detect business report or task anomalies.

Custom rules enable automated regression of report metrics on each task schedule.

6.2 Evolution of Test SQL

6.2.1 Data Scope

During requirement iterations, test SQL shifts from store‑level queries to daily partition queries, covering more data and scenarios to catch production issues.

6.2.2 Custom Comparison SQL

Custom SQL aggregates results; the expected and actual results are compared by unioning two queries and aggregating row counts—if the count differs from 1, the data is inconsistent.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL StarRocks Data Quality Data Warehouse Hive BI testing

Written by

Weimob Technology Center

Official platform of the Weimob Technology Center

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.