How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink
This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.
1. OLAP Evolution at Vipshop
Vipshop's real‑time platform OLAP team, led by Wang Yu, manages open‑source modifications and optimizations for Presto, ClickHouse, Kylin, and Kudu, supporting over 500 high‑performance servers that handle up to 5 million queries per day and process roughly 3 PB of data.
2. OLAP Deployment Architecture
The data layer includes Hadoop and Spark; data is ingested from Hive to ClickHouse via Waterdrop (Spark‑based in older versions, Flink‑based in newer versions). Kafka + Flink writes real‑time data to ClickHouse or HBase. A proxy layer uses chproxy (an open‑source project) for user‑level permission control, with HAproxy providing high availability.
Clients connect to HAproxy, which balances load across Presto clusters. The team also built an internal OLAP cross‑service tool called Nebula (formerly Spider) for monitoring and alerting via Prometheus and Presto’s own APIs.
3. Presto Business Architecture
Presto serves as the main OLAP engine, connecting to multiple data sources (Hive, Kudu, MySQL, Alluxio) through its connector framework. Multiple Presto clusters are deployed per business line, each monitored by Spider/Nebula, which records query metadata into MySQL and later ETL‑ed to Hive for reporting. Load‑balancing is achieved by scoring clusters based on query volume and node health.
4. Presto Improvements
The team developed a management tool that modifies Presto server and connector source code, extracts node and query information via Presto’s API, stores query logs in MySQL, and performs ETL to Hive for analysis. Scoring and load‑balancing ensure queries are routed to less‑loaded clusters, providing HA without user impact.
5. Presto Containerization
Presto is deployed on Kubernetes, enabling automatic scaling and resource scheduling. Containers are added to the Presto service, and the same admission tools provide intelligent routing. Night‑time resources are released back to Spark clusters, achieving efficient resource utilization.
6. ClickHouse Introduction
Due to increasing OLAP latency requirements (sub‑second response for hundred‑billion‑row joins), Vipshop introduced ClickHouse as a faster alternative. ClickHouse stores raw data in its own directories and Zookeeper, with local and distributed tables. Benchmarks show ClickHouse can be up to 10× faster than Presto for specific query patterns.
Key use cases include massive‑scale joins, complex mathematical aggregations, and bitmap‑based audience calculations.
7. ClickHouse Advantages over Traditional OLAP Engines
ClickHouse excels at wide‑table aggregation, delivering query times of 1–2 seconds for tens of billions of rows on a 24 CPU × 128 GB × 10‑shard cluster. It provides efficient bitmap algorithms (RoaringBitmap) and supports high‑performance joins with partitioning and sharding strategies.
8. Experiment Platform: Flink + ClickHouse Self‑Service Analysis
The experiment platform processes A/B‑test logs through a Flink SQL pipeline, sinking data to Redis for dimension tables and finally to ClickHouse as wide tables. The pipeline uses a custom ClickHouse connector with hash‑based sharding (MurmurHash3_64) to ensure data locality for joins.
Step‑by‑step data flow:
Query system.tables to obtain the target table’s engine.
Parse engine metadata to retrieve cluster and local table names.
Query system.clusters to get shard node information.
Generate shard‑specific URLs and flush data to ClickHouse according to configured intervals.
9. ClickHouse Challenges and Optimizations
Challenges include handling hundred‑billion‑row joins, memory limits in Presto, and ensuring consistent sharding when tables are repartitioned. Solutions involve bucketed joins using hash‑based distribution, avoiding sub‑queries on the left side of joins, and careful management of materialized views (which require pausing writes during back‑fill).
Materialized views accelerate query speed but increase write overhead and storage consumption; they are best used for stable, pre‑computed reporting.
10. Q&A Highlights
Key questions addressed performance of ClickHouse ingestion (up to 600 k TPS on a 10‑node cluster), storage‑space optimizations (compression, HyperLogLog, partitioning), differences between ClickHouse and Kylin/Kudu, consistency guarantees between Flink and ClickHouse, and strategies for compute‑storage separation using JuiceFS.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
