Databases 15 min read

ClickHouse Overview and the Top 5 Features Released in 2021

This article provides a comprehensive overview of ClickHouse, covering its origins, core characteristics, and the five most important features introduced in 2021—including JIT acceleration, Lambda‑based UDFs, native window functions, zero‑copy replication for S3/HDFS, and the Projection mechanism—highlighting why it remains a leading high‑performance OLAP database for big‑data analytics.

DataFunTalk
DataFunTalk
DataFunTalk
ClickHouse Overview and the Top 5 Features Released in 2021

Guest Speaker: Zhu Kai, Chief Expert of Mingyuan Cloud Big Data Platform (DataFunTalk). Edited by Xiao Peng (VIVO).

Introduction: ClickHouse has been renowned for its speed since its inception in 2016, and its rapid release cadence continued in 2021 with thousands of new features. This article offers a brief introduction to ClickHouse and details five key features released in 2021.

01 ClickHouse Overview

1. Why the name ClickHouse – The name combines “Click Stream” (click‑stream data) and “Data Warehouse”, reflecting its original goal of supporting click‑stream‑based data warehouses.

2. Background – ClickHouse originated at Yandex, the Russian internet giant, to power its analytics platform Metrica, which processes billions of events daily. In 2021 the founding team spun off a commercial company, raising $50 M (Series A) and $250 M (Series B) to focus on ClickHouse Cloud services.

3. Notable Characteristics

① Easy to start – an OLAP database with full DBMS capabilities, supporting SQL, DDL, DML, ROLAP and MOLAP models, and projections.

② Everything is a table – dozens of table engines, including external resources (Zookeeper, HDFS, files), built‑in MySQL/PostgreSQL binlog listeners, and even a contributors table.

③ Rich interfaces – TCP/HTTP low‑level access, JDBC, CLI, and client libraries for Java, Python, Node.js, plus hundreds of built‑in functions.

④ Online query – real‑time responses without pre‑processing, with optional cube pre‑aggregation.

⑤ Distributed architecture – MPP, cluster mode, data partitioning, sharding, and replication.

⑥ High performance – columnar storage, high compression, vectorized engine, delivering high speed even on a single node.

⑦ Security and reliability – circuit‑breaker and safe‑delete mechanisms.

⑧ Comprehensive permission system – RBAC, client, resource, operation, and row‑level permissions.

⑨ Active open‑source community – Apache‑2.0 license, >850 contributors, 21.1 K+ stars, 4.1 K forks, with a release pace matching its performance.

02 2021 Top 5 Features

1. JIT‑driven query acceleration – ClickHouse combines vectorized execution with runtime code generation (JIT), improving cache reuse and CPU instruction utilization. Depending on the workload, JIT can provide 1.5‑3× speedups, with special cases reaching up to 20×.

Since version 21.6, JIT compilation time is around 15 ms and scales linearly with query complexity. It accelerates both expression evaluation in SELECT and aggregation functions.

2. Lambda‑based UDF support – Starting with version 21.10, users can define custom functions using Lambda expressions, stored under a user_defined directory, and invoke them directly in queries, with support for nested calls.

3. Native window functions – From version 21.3, ClickHouse includes built‑in window and analytical functions, simplifying year‑over‑year and month‑over‑month analyses that previously required complex array joins.

4. Zero‑copy replication for S3 and HDFS storage – ClickHouse now supports tiered storage, allowing data to reside on object stores (S3, HDFS, OSS) while only metadata is synchronized via ZooKeeper, eliminating redundant data copy and improving availability.

Since version 21.4.1, the zero‑copy mechanism ensures only metadata is replicated, delegating actual data transfer to the underlying cloud storage.

5. Projection – Introduced to address the limitations of a single sort order and traditional materialized views. Projections store pre‑processed data at the part level, share metadata with the base table, and act as intelligent indexes that the query planner can automatically match, delivering significant performance gains without the overhead of separate tables.

Projections coexist with materialized views; they are not cross‑table, but can be combined for complex aggregation scenarios.

Thank you for listening.

PerformanceBig DataDatabaseClickHouseOLAPFeaturesProjection
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.