Big Data 12 min read

Big Data OLAP Applications and Practices: Insights from Xiaomi and 58.com

The article reviews the 2018 58 Group technology salon on big‑data OLAP, summarizing Xiaomi’s one‑stop OLAP architecture, 58.com’s challenges and solutions using Kylin, Druid, and UnionSQL, and the practical implementations and optimizations that illustrate modern OLAP practices.

58 Tech
58 Tech
58 Tech
Big Data OLAP Applications and Practices: Insights from Xiaomi and 58.com

Background

On November 13, 2018, the 58 Group Technology Salon (Session 3) titled “Big Data OLAP Application and Practice” was held at the Beijing headquarters, organized jointly by the 58 Group Technical Engineering Platform and the HR Magic Academy. Speakers from Xiaomi’s AI & Cloud Platform big‑data team, 58 TEG Data Intelligence team, and related business‑line developers shared their OLAP experiences.

Key Takeaways

1. Xiaomi One‑Stop OLAP Solution

1.1 Xiaomi Big‑Data Architecture Overview

As a company‑level big‑data R&D team, Xiaomi organizes internal business data into a layered “data pyramid”, which forms the core of its big‑data platform architecture.

(Figure provided by Xiaomi)

The raw data layer aggregates original business data, which is then cleaned and transformed into a data warehouse (middle layer). Aggregated wide tables (summary layer) are built per business domain, and finally data is loaded into various engines for application use. Unified data management and job scheduling platforms handle data governance and task orchestration.

On top of the data pyramid, Xiaomi provides a unified data service layer that supports OLAP queries, point queries, behavior analysis, and model capabilities, offering a consistent interface, authentication, compliance auditing, quality monitoring, caching, and cross‑engine, cross‑datacenter transparent query capabilities.

1.2 One‑Stop OLAP Solution

The solution consists of data governance tools and a query engine called UnionSQL. UnionSQL provides a unified SQL interface, a self‑developed Query Router for parsing, splitting, and result merging, as well as Lambda and cross‑datacenter capabilities. It uses Apache Kylin for batch processing and Elasticsearch/Kudu for speed layer, enabling real‑time analytics.

(Figure provided by Xiaomi)

Unified SQL interface

Self‑developed Query Router with SQL parsing, splitting, result merging, Lambda and cross‑datacenter support

Batch layer powered by Apache Kylin, speed layer by Elasticsearch/Kudu for Lambda capabilities and real‑time analysis

1.3 OLAP Application Case: Xiaomi Intelligent Data Analysis Decision System

Step 1: Built a company‑wide BI system that aggregates key data and, together with the user‑profile platform, provides natural‑language data query and visualization.

Step 2: Implemented automated dimension splitting using pre‑computation engines like Apache Kylin, allowing users to drill down into dimensions that contribute most to metric changes.

Step 3: Integrated anomaly detection and other mathematical models to proactively discover issues, not just answer queries.

2. 58.com OLAP Technology Application and Practice

58.com faces massive OLAP scenarios across business lines (real‑estate, recruitment, classifieds, automotive, etc.), generating hundreds of terabytes of new data daily and demanding high‑performance analytical queries.

Challenges:

Data scale: daily addition of billions of rows and rapidly expanding dimensions (from dozens to hundreds).

Development efficiency and cost: traditional MR/Hive/Spark pipelines require many manual steps to build cubes, leading to high maintenance overhead.

Query speed: most interactive queries must return results within 10 seconds.

Real‑time processing: A/B testing, new‑business tracking, advertising effectiveness, etc., increasingly rely on real‑time OLAP.

Ad‑hoc queries: users need flexible, self‑service query capabilities.

To address these, 58.com combines Kylin, Druid, SQL‑on‑Hadoop, and a self‑developed real‑time processing engine, delivering services through the “Cloud Window” self‑service analytics platform, which reduces OLAP adoption barriers and cuts repetitive development costs.

For the WMDA user‑behavior analysis platform, which involves hundreds of dimensions, long time spans, and diverse query patterns, a Druid‑based OLAP solution is employed.

These implementations illustrate the evolution of OLAP practice at 58.com and highlight the ongoing challenges of building a modern, virtualized, and democratized data warehouse.

3. Druid Technical Practice in 58.com

3.1 Druid Introduction

Druid is a high‑performance, low‑latency real‑time OLAP engine. 58.com adopts community version v0.9.2 and has performed extensive functional optimizations and platform‑level enhancements, dramatically improving performance and stability for multiple departments.

3.2 Functional Optimizations

Multi‑tenant architecture for building, storage, and query to ensure stability of critical services.

Replaced default cache with CaffeineCache for better read/write performance.

Developed segment merge functionality, saving about 60% storage space and boosting query speed.

Added SQL query capability and fixed several bugs (memory leaks, OOM, slow‑query‑induced pending tasks).

3.3 Platformization

Simplified onboarding: users can connect via a web form, monitor storage usage, task status, and control business lifecycle.

Built comprehensive monitoring and alerting, added metrics, and enhanced daily statistics and task analysis.

3.4 Typical Cases

Druid now serves WMDA, Lego, Sundial, DSP effect evaluation, whole‑site real‑time multi‑dimensional analysis, etc., handling over 650 data sources, more than 25,000 daily build tasks, ingesting over 600 billion raw records, and supporting up to 110 dimensions.

4. Summary

OLAP technology is a vital component of the big‑data ecosystem, widely applied in both 58 Group and Xiaomi Group. The salon highlighted concrete use cases, current challenges, and one‑stop solutions, fostering knowledge exchange and encouraging further innovation in OLAP practice.

Future salons will continue to explore breakthroughs and integration of underlying technologies.

big dataReal-time AnalyticsData WarehouseOLAPDruidKylinUnionSQL
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.