Big Data 14 min read

Dashiang Cube: A Multi‑Source BI Reporting Tool with Custom Join Algorithms

Meituan‑Dianping’s Dashiang Cube is a multi‑source BI reporting platform that unifies MySQL, Kylin, Elasticsearch and plain‑text data via a common SQL layer, generates dialect‑specific queries, performs custom back‑tracking inner and left outer joins across heterogeneous sources, supports scripted metric calculations, permission controls, and a reusable UI component library for self‑service reporting.

Meituan Technology Team

Nov 2, 2017

Dashiang Cube: A Multi‑Source BI Reporting Tool with Custom Join Algorithms

In modern internet data‑warehouse systems, massive cube or semi‑cube datasets are stored in data centers. Discovering the intrinsic relationships among these data often requires extensive programming and SQL writing, leading to repetitive work, lack of validation, and no visual UI charts. To address this, Meituan‑Dianping’s hotel‑travel tech team built “Dashiang Cube”, a cube‑based BI reporting tool that supports multidimensional queries, roll‑up, drill‑down, and more.

Key challenges include:

Unified data source access

SQL generation for heterogeneous back‑ends

Cross‑data‑source aggregation

Custom metric calculations

Data permission management

Standardized UI components for self‑service report creation

Solution architecture (see original diagram) consists of four engines that handle aggregation, slicing, calculation, and rendering. Dashiang Cube provides a unified query service: relational sources such as MySQL and Kylin are accessed via a common SQL layer, Elasticsearch via its native API, and plain‑text sources via custom APIs.

SQL generation tackles two problems: differing dialects across data sources and dynamic construction of SQL based on user‑selected dimensions, metrics, and filters. The team defined SQL templates per dialect and performed placeholder substitution using a custom Java regex‑based string‑operation library, rejecting heavyweight frameworks like Apache Calcite or Alibaba Druid for their complexity and limited customizability.

Cross‑data‑source aggregation required custom join logic because many engines cannot perform joins across heterogeneous sources. The team implemented both inner join and left outer join using a back‑tracking algorithm, allowing an arbitrary number of data sources.

Inner join core code :

private void join(List<Map<String,String>>[] contents,List<Project> sharedList,final int n,int[] rowsStatus,LinkedList<MatchRow> result){
    if(this.cubeJoin==1){
        throw new java.lang.IllegalArgumentException("left join call leftJoin method,not call join method");
    }
    if(n<contents.length){
        List<Map<String,String>> list = contents[n];
        for(int k=0;k<list.size();k++) {
            boolean equal = true;
            if(n!=0) {
                Map<String, String> prev = contents[n - 1].get(rowsStatus[n - 1]);
                Map<String, String> cur = list.get(k);
                for (Project proj : sharedList) {
                    String key = proj.fieldName.toUpperCase();
                    if (key.matches("^\\d+$") || key.equals("*")) {
                        key = "_";
                    }
                    key = proj.isCompanion() ? key + proj.getFactId() : key;
                    String prevValue = prev.get(key);
                    String curValue = cur.get(key);
                    if (prevValue == curValue) {
                        continue;
                    }
                    if (prevValue == null || curValue == null || !prevValue.equals(curValue)) {
                        equal = false;
                        break;
                    }
                }
            }
            if (equal) {
                rowsStatus[n] = k;
                if(n==contents.length-1){
                    MatchRow mr = new MatchRow();
                    List<MatchRow.DatasetRow> tmp = new ArrayList<>();
                    for(int i=0;i<rowsStatus.length;i++){
                        MatchRow.DatasetRow dr = new MatchRow.DatasetRow();
                        dr.setDatasetIndex(i);
                        dr.setRowIndex(rowsStatus[i]);
                        tmp.add(dr);
                    }
                    mr.addMatchRow(tmp);
                    result.add(mr);
                } else {
                    join(contents,sharedList,n+1,rowsStatus,result);
                }
            }
        }
    }
}

The parameters are: contents: result sets from each data source. sharedList: list of fields used for joining. n and rowsStatus: recursion state for back‑tracking. result: collection of rows that satisfy the join condition.

Left outer join core code :

private boolean leftJoin(List<Map<String,String>>[] contents,List<Project> sharedList,final int n,int[] rowsStatus,LinkedList<MatchRow> result){
    boolean leftJoinMatch = false;
    if(n<contents.length){
        List<Map<String,String>> list = contents[n];
        for(int k=0;k<list.size();k++) {
            boolean equal = true;
            if(n!=0) {
                //in left join,compare with the first dataset.
                Map<String, String> prev = contents[0].get(rowsStatus[0]);
                Map<String, String> cur = list.get(k);
                for (Project proj : sharedList) {
                    String key = proj.fieldName.toUpperCase();
                    if (key.matches("^\\d+$") || key.equals("*")) {
                        key = "_";
                    }
                    key = proj.isCompanion() ? key + proj.getFactId() : key;
                    String prevValue = prev.get(key);
                    String curValue = cur.get(key);
                    if (prevValue == curValue) {
                        continue;
                    }
                    if (prevValue == null || curValue == null || !prevValue.equals(curValue)) {
                        equal = false;
                        break;
                    }
                }
            }
            if (equal) {
                leftJoinMatch = true;
                rowsStatus[n] = k;
                if(n==contents.length-1){
                    MatchRow mr = new MatchRow();
                    List<MatchRow.DatasetRow> tmp = new ArrayList<>();
                    for(int i=0;i<rowsStatus.length;i++){
                        MatchRow.DatasetRow dr = new MatchRow.DatasetRow();
                        dr.setDatasetIndex(i);
                        dr.setRowIndex(rowsStatus[i]);
                        tmp.add(dr);
                    }
                    mr.addMatchRow(tmp);
                    result.add(mr);
                } else {
                    //if next dataset is not match,use the next's next...
                    for(int loopFlag=n+1;loopFlag<rowsStatus.length;loopFlag++){
                        boolean match = leftJoin(contents,sharedList,loopFlag,rowsStatus,result);
                        if(match){
                            break;
                        }
                        rowsStatus[loopFlag]=-1;
                        if(loopFlag==contents.length-1){
                            MatchRow mr = new MatchRow();
                            List<MatchRow.DatasetRow> tmp = new ArrayList<>();
                            for(int i=0;i<rowsStatus.length;i++){
                                MatchRow.DatasetRow dr = new MatchRow.DatasetRow();
                                dr.setDatasetIndex(i);
                                dr.setRowIndex(rowsStatus[i]);
                                tmp.add(dr);
                            }
                            mr.addMatchRow(tmp);
                            result.add(mr);
                        }
                    }
                }
            }
        }
    }
    return leftJoinMatch;
}

Left outer join differs from inner join in two ways: it continues matching with subsequent data sources when the first pair fails, and it retains left‑side rows when no right‑side match exists.

Custom metric calculation is achieved via Java’s ScriptEngine, allowing mixed‑type column operations and special metrics (e.g., same‑period comparison) implemented through a dedicated interface.

Data permission handling uses UPM for report‑view rights, a default‑allow mechanism for dimensions/metrics, and an approval workflow for self‑service permission requests.

Standardized UI components provide a component library, layout system, and templates enabling WYSIWYG report creation.

Summary : Most functionalities of Dashiang Cube are already in production; a few items (standard UI components, advanced star‑schema support, metadata sync) remain under development. The tool has been live for nearly a year, serving many internal services, with future work planned on UI usability, star‑model enhancements, configuration simplification, and metadata synchronization.

Recruitment : The team is hiring developers interested in BI tool development (contact: [email protected]).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Data Warehouse SQL generation BI Join Algorithms Data Permissions

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.