Big Data 23 min read

ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans

This article details Youzan's adoption of ClickHouse for real-time analytics, covering its evolution from Presto, Druid, and Kylin, the system's architecture, deployment strategies, use cases, performance characteristics, limitations, and future roadmap, including integration with Apache Doris and emerging big‑data trends.

DataFunTalk
DataFunTalk
DataFunTalk
ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans

Youzan is a merchant service platform that has evolved its OLAP stack over the years, first introducing Presto in 2018 for interactive offline queries, then Druid in early 2019 for real‑time analysis, and Kylin later in 2019 for high‑precision offline analytics.

In 2020 ClickHouse was adopted to provide low‑latency, detail‑level aggregation for use cases such as SCRM, DMP, live‑stream analysis, and log metrics, complementing existing tools with dynamic aggregation and materialized views.

A comparative analysis of Presto, Druid, Kylin, and ClickHouse is presented, evaluating technical architecture, latency, SQL support, production cost, join capabilities, and deduplication methods. ClickHouse stands out for its flexible detail queries, columnar storage, vectorized engine, code generation, primary‑key sorting, and secondary indexes like Bloom filter.

The article also outlines ClickHouse's limitations: lack of fast row‑level updates/deletes, sub‑optimal point‑lookup performance, no native transaction support, and limited join strategies.

Typical application scenarios include user‑behavior analysis, real‑time log monitoring, and various business intelligence dashboards.

ClickHouse’s internal design is described: a MergeTree storage engine with partitioned parts, sparse primary‑key indexes, and a processing pipeline that merges parts similarly to LSM‑Tree. The system supports both batch and streaming ingestion via Spark (batch) and Flink (streaming), requiring bulk writes to avoid excessive part creation.

Deployment architecture features a master/standby master for high availability, LVS load balancing, Apisix as an API gateway, distributed tables with replicated shards, Zookeeper for coordination, and a custom CHSegmentPusher/Puller workflow for moving data from temporary k8s clusters to HDFS and then to the production cluster.

Read/write separation is achieved by using temporary k8s clusters for bulk writes, followed by data push to HDFS and pull into the main cluster; Apache Helix manages task queues and component health.

Three concrete use cases are highlighted: a DMP audience‑portrait system that uses bitmap tables and orthogonal hash sharding; an SCRM merchant‑member management system that requires dynamic multi‑dimensional queries; and a log‑monitoring Top‑N system (Tianwang) that migrated from Druid to Flink + ClickHouse, achieving 60% cost reduction and higher performance.

Future plans include containerizing ClickHouse for better elasticity, expanding its adoption across more business lines, improving platform features such as multi‑tenant isolation, rate limiting, and monitoring, and exploring dual‑path architectures that combine Druid and ClickHouse.

Current pain points are the manual nature of ClickHouse operations (lack of automatic rebalancing), limited join performance, and poor row‑level update/delete capabilities, prompting investigations into storage‑compute separation and exchange‑join implementations.

A proof‑of‑concept integrates ClickHouse with Apache Doris by embedding Doris’s storage layer (StorageDorisOLAP) and exchange node into ClickHouse, aiming to combine ClickHouse’s speed with Doris’s automatic balancing and shuffle‑join support.

The article concludes with observations on big‑data trends: cloud‑native, multi‑model databases, and hardware acceleration (GPU/FPGA), followed by a short Q&A covering index relationships, Order‑by vs. primary key, read/write‑separation queues, and data deduplication strategies.

data-platformClickHouseOLAPYouzan
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.