Big Data 20 min read

How Youzan’s BI Platform Turns Massive Data into Interactive Visual Insights

This article explains the design, features, and technical implementation of Youzan’s BI platform, covering its target users, visualization workflow, supported chart types, filtering, permissions, drill‑down, calculated fields, SQL generation logic, and future development directions.

Youzan Coder
Youzan Coder
Youzan Coder
How Youzan’s BI Platform Turns Massive Data into Interactive Visual Insights

Overview

Youzan is a SaaS and big‑data company that needed an efficient way to extract value from massive datasets and provide visual analysis and reporting. The Youzan BI platform was created to meet this need, offering fast, convenient data analysis and visualization for various business scenarios. It currently serves over 700 UV, 30 k+ PV, and hosts more than 5 k reports.

Target Users and Application Scenarios

Report developers – typically BI analysts and data engineers.

Report consumers – operations, service, product teams and anyone needing visual data insights.

Other internal applications – products that require data display or analysis.

Typical usage scenarios are illustrated in the accompanying diagram.

Data Visualization

Terminology

Dimension : the perspective for observing data, usually categorical (e.g., date, store). It can be a row dimension (similar to Excel row headers) or a column dimension (multiple row headers above numeric columns).

Metric : the numeric value to be aggregated, such as sales amount or profit.

Dimension types
Dimension types

Quick Start

Step 1 – Add Dataset : Add dataset → select connection account → custom SQL → preview data → submit. The dataset is a logical SQL that serves as the source table for subsequent charts.

Step 2 – Add Report : Create chart → choose the previously created dataset → confirm to enter the report editor.

Step 3 – Add Permissions : Go to the dashboard → click the gear icon → add permissions, or manage permissions via the resource manager. This ensures only authorized users can operate on the data.

Tips: mark field date types in the upper‑right gear to enable time‑grain operations; add calculated fields for secondary calculations; use the report copy function to duplicate similar reports.

Chart Types

The platform supports more than ten chart types covering most daily visualization needs, including KPI cards, line charts, dual‑axis charts, area and stacked‑area charts, bar and stacked‑bar charts, pie charts, funnel charts, Sankey diagrams, radar charts, and word clouds. Each type is suited to specific scenarios, as shown in the example images.

Chart type examples
Chart type examples

Filtering and Sorting

Users can filter fields or metrics using operators such as >, >=, <, <=, =, <>, IS NULL, BETWEEN, LIKE, IN, etc., and can filter by time dimensions (year, month, quarter, week) after marking a field as a date type. Sorting can be applied to rows or metrics in ascending or descending order. Pagination is handled differently per database (e.g., LIMIT for MySQL, ROW_NUMBER() for Presto).

Date types and formats
Date types and formats

Calculated Fields

Users may write custom SQL snippets to generate virtual fields. Non‑aggregate expressions are treated as regular fields, while aggregate expressions are used directly without an extra aggregation step. When a calculated field coexists with a column dimension, the platform wraps the aggregate with a try() function and adds conditional IF clauses to handle each column‑dimension value.

Calculated field examples
Calculated field examples

Drill‑Down and Linking

Drill‑down allows users to view finer‑granularity data (e.g., from province‑level to city‑level) by applying the same filter logic across linked reports. Linking ensures that when a filter changes in one report, related reports update automatically.

Linking example
Linking example

Row/Column Permissions

Column permissions hide specific fields from certain users or groups, while row permissions restrict visible rows based on conditions. Two modes are supported: condition mode (predefined expressions) and free mode, where users write SQL conditions (e.g., [dep] IN {db.table.businessId.value}).

Row/Column permission example
Row/Column permission example

Chart Push and External Integration

The platform can schedule report pushes so users receive data snapshots without logging in. External products can embed BI dashboards, reports, or components via the provided SDK and pass parameters for flexible display.

Implementation Principles

SQL Generation Process

The front‑end drag‑and‑drop actions are translated into SQL components: row dimensions become GROUP BY, metrics become aggregation functions, filters become WHERE, sorting becomes ORDER BY, and pagination is handled per database. Example generated SQL:

SELECT *
FROM (
  SELECT *, row_number() OVER () AS bi_rownum
  FROM (
    SELECT order_type AS bi_rowdim0,
           COUNT(buyer_id) AS bi_metric0,
           SUM(gmv) AS bi_metric1
    FROM dm_zbk.da_self_analysis_demo
    WHERE the_date >= '2020-01-01' AND the_date <= '2020-12-31'
    GROUP BY order_type
    ORDER BY bi_metric1 ASC
  )
) 
WHERE bi_rownum > 0 AND bi_rownum <= 10;

The diagram below visualizes this flow.

SQL generation flow
SQL generation flow

Column Dimension Handling

Three approaches exist: (1) treat column dimensions as additional GROUP BY fields, (2) use database‑specific pivot functions, (3) generate conditional aggregation with IF statements. The platform adopts the third method, producing SQL such as:

SELECT SUM(IF(order_type='订单类型1', gmv, 0)) AS bi_metric0,
       SUM(IF(order_type='订单类型2', gmv, 0)) AS bi_metric1,
       SUM(IF(order_type='订单类型3', gmv, 0)) AS bi_metric2,
       SUM(IF(order_type='订单类型4', gmv, 0)) AS bi_metric3
FROM dm_zbk.da_self_analysis_demo;

When many column values exist, distinct values are fetched first and pagination can be applied.

Calculated Field Processing

Aggregate calculated fields are wrapped with try() to avoid runtime errors (e.g., division by zero). The platform uses ANTLR to parse the user‑provided SQL, inserts try() around aggregate functions, and adds conditional IF clauses based on column‑dimension values.

SELECT try(SUM(IF(order_type='订单类型3', gmv, 0))) AS bi_metric0,
       try(SUM(IF(order_type='订单类型4', gmv, 0))) AS bi_metric1,
       try(SUM(IF(order_type='订单类型2', gmv, 0))) AS bi_metric2,
       try(SUM(IF(order_type='订单类型1', gmv, 0))) AS bi_metric3
FROM dm_zbk.da_self_analysis_demo;

Conclusion and Outlook

The Youzan BI platform consolidates the data‑visualization capabilities of the Youzan data middle‑platform, offering efficient internal data analysis. Future plans include supporting richer chart types, adding advanced features such as subtotals, period‑over‑period comparisons, alerts, large‑screen dashboards, intelligent analysis, and further reducing user effort while improving interaction experience.

Analyticsbig dataplatform architectureData VisualizationSQL GenerationBI
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.