Big Data 23 min read

How Vipshop Scaled Real‑Time OLAP: From GreenPlum to Presto, Kylin, and Redis

Vipshop faced massive data growth that broke traditional RDBMS, causing slow OLAP queries, inefficient ETL, and long development cycles, so it iteratively rebuilt its analytics stack—adding Hadoop/Hive, a self‑service UI, Presto, Kylin, and Redis—to achieve sub‑second query responses, higher concurrency, and a flexible, low‑latency BI solution.

dbaplus Community
dbaplus Community
dbaplus Community
How Vipshop Scaled Real‑Time OLAP: From GreenPlum to Presto, Kylin, and Redis

Challenges of Massive Real‑Time OLAP

Vipshop’s traffic grew from millions to hundreds of millions of users, increasing data volume by more than 100×. Traditional RDBMS could no longer store or compute the data, leading to three critical problems:

Slow OLAP query response (often >100 s).

ETL pipelines became bottlenecks, taking hours instead of minutes.

Feature development cycles lengthened because new dimensions or metrics required large data refreshes.

Evolution of the OLAP Architecture

Phase 0 – Greenplum + Tableau

Greenplum was used as an MPP data warehouse and Tableau for BI. This worked for modest data sizes but horizontal scaling limits of Greenplum caused storage and compute exhaustion as data grew.

Phase 1 – Hadoop/Hive + Greenplum

Batch preprocessing moved to Hadoop/Hive; results were synchronized to Greenplum for ad‑hoc analysis. This eliminated the storage bottleneck but introduced data‑sync complexity and highlighted Tableau’s inability to support flexible self‑service analytics.

Phase 2 – Self‑Service UI + SQL Parser

A drag‑and‑drop UI let users compose dimensions and metrics. The UI generated a logical data description that a custom SQL parser translated into executable SQL. The parser exposed the need for a faster, more flexible query engine.

Phase 3 – Adoption of Presto

Presto (Facebook’s open‑source MPP OLAP engine) was introduced because it:

Executes queries in‑memory across a distributed cluster.

Supports joins on HDFS metadata without moving data.

Provides low‑latency query execution.

Benchmarking on identical hardware showed Presto reduced typical query time from >100 s (Greenplum) to ~10 s – a 70 % performance gain.

Phase 3.5 – Redis In‑Memory Cache

Redis was added as a first‑level cache. Query flow:

Check Redis for an exact SQL‑to‑result match.

If miss, fall back to Kylin (MOLAP) or Presto (ROLAP).

Overall cache hit rate reached 15 % (up to 60 % for hot topics), freeing compute resources.

Phase 4 – Hybrid Presto + Kylin Architecture

Kylin (eBay’s open‑source MOLAP engine) handles high‑cardinality cube queries, while Presto serves the remaining ROLAP workload. The routing logic (Redis → Kylin → Presto) covers ~90 % of analysis scenarios and reduces core metric latency to a few seconds.

Engine‑Specific Enhancements

Presto Improvements

Added /*+ HINT */ syntax to let users choose join strategies (replicated vs distributed) at compile time.

Implemented join‑skew detection and mitigation to avoid data‑skew hotspots.

Enabled multi‑cluster load balancing for horizontal scaling.

Rewrote the Kafka connector to support hot‑topic updates and offset‑based reads, including protobuf payload handling.

Developed a Kylin connector that automatically pushes sub‑queries matching existing Kylin cubes to Kylin, reducing the amount of data processed by Presto.

Kylin Improvements

Integrated Presto as a fallback for massive dimension lookups that previously caused OOM in Kylin.

Extended Kylin with Vipshop‑specific business logic (e.g., custom dimension hierarchies).

Implemented a “CUBE Advisor” that analyses dimension‑metric combinations and recommends optimal cube designs, improving cube hit rates.

Provided APIs for ETL scheduler integration to refresh cube data and manage cache lifecycles.

Future Directions

Planned upgrades focus on finer‑grained data pruning and real‑time capabilities:

Replace ORC row‑group indexes with row‑ID level indexes to prune data at scan time and increase query concurrency.

Explore streaming cubing in Kylin to ingest real‑time data into cubes.

Investigate Lambda architecture‑style “streaming + batch” cubing for seamless real‑time and offline cube fusion.

Research next‑generation engines that tightly couple raw and aggregated data while scaling compute horizontally without stateful nodes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-time analyticsredisData WarehouseOLAPPrestoKylin
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.