Latest ClickHouse Technologies and Practical Applications
ClickHouse, born from Yandex’s Metrica and now a top‑50 open‑source analytics engine, achieves exceptional speed through a vectorized compute engine, column‑store architecture, and an active community, powering real‑time workloads at companies like Tencent Music, Sina, Bilibili, and Suning while introducing features such as column merging, projections, and storage‑compute separation for future scalability.
ClickHouse has become a breakout star in the big‑data analytics engine arena in recent years, climbing into the top‑50 of DB‑engine rankings and delivering query speeds that are several times faster than competing engines on single‑table workloads.
The following summary is based on Guo Wei’s talk “ClickHouse Latest Technology Practice and Application” at the Techo TVP Developer Summit.
1. The Origin and Rise of ClickHouse
ClickHouse originated from Yandex’s Metrica tool in Russia and was open‑sourced at the end of 2016. Since its introduction to China in 2017, it has been adopted by major companies such as Tencent, ByteDance, and Sina. Its rapid ascent in DB‑Ranking (up 71 places to rank 50) reflects its strong performance and growing ecosystem.
2. Why ClickHouse Is So Fast
Three main reasons are highlighted:
• Vectorized computation engine : ClickHouse uses a vectorized execution model similar to Snowflake, with low‑level C/C++ optimizations, hash‑based aggregations, and fine‑grained memory handling.
• Column‑store design : Each column is stored in a separate file and can use a distinct compression algorithm. Features such as Projections enable pre‑aggregation, further accelerating queries.
• Active open‑source community : The community rapidly incorporates performance improvements, often releasing multiple versions per month.
3. Real‑World Use Cases
• Tencent Music : ClickHouse powers a real‑time data warehouse that supports interactive analytics for music streaming data, handling both batch and streaming workloads via Kafka → Flink → ClickHouse pipelines.
• Sina : Processes 300 billion daily events and 8 million queries. ClickHouse’s fast ingestion and query capabilities enable real‑time dashboards and Grafana visualizations.
• Himalaya : Uses ClickHouse for user‑behavior analysis, audience segmentation, and machine‑log querying, combining Spark streaming and batch loads.
• QuTouTiao : Handles trillions of rows and 21 k queries per day using a hybrid ClickHouse + Presto architecture to overcome join limitations.
• Bilibili : Deploys ClickHouse for massive user‑behavior analytics, ingesting data from Kafka/Flink and Spark, and serving queries via JDBC to BI tools.
• Suning : Uses ClickHouse as the final query layer for user‑profile data, materializing views to reduce computation cost.
• Jinshuju : Augments MongoDB with ClickHouse for fast reporting on large statistical forms.
• Huya Live : Stores log data in ClickHouse via multiple Kafka clusters for rapid log queries.
4. Latest Features and Roadmap
• Column merging : Reduces the effective column count (e.g., 2000 → 100) to improve query speed.
• Projections : Supports pre‑aggregation for all functions, not just aggregates.
• Storage‑compute separation : Demonstrated on Tencent Cloud with S3 integration, enabling independent scaling of storage and compute resources.
The community encourages users to follow the official WeChat and B‑station channels for deeper technical details and meetup videos.
5. Future Outlook
ClickHouse aims to deepen scenario coverage, address the “last mile” of data delivery, and potentially evolve into a commercial offering in partnership with the Russian core team.
Speaker Biography
Guo Wei – CTO of Analysys, Tencent Cloud TVP, Apache Foundation Member, initiator of Apache DolphinScheduler, founder of the ClickHouse China community, and veteran of major data‑engineering roles at Lenovo, Wanda, IBM, and Teradata.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.