An Overview of ClickHouse: Features, Performance, Use Cases, and Limitations
ClickHouse is a column‑oriented, open‑source OLAP database developed by Yandex that offers high‑compression columnar storage, vectorized execution, and massive read/write throughput, making it ideal for large‑scale analytics while having specific usage scenarios and notable limitations such as lack of true transactions and secondary indexes.
ClickHouse is a column‑oriented database management system (DBMS) designed for online analytical processing (OLAP). It was open‑sourced by Yandex in Russia and is now used by major Chinese companies such as Tencent, ByteDance, Ctrip, and Kuaishou, with clusters scaling to thousands of nodes; Alibaba Cloud even offers ClickHouse as a cloud service.
ClickHouse Features
The system was built from OLAP requirements and implements a custom high‑efficiency columnar storage engine. Columnar storage means data is stored and scanned by column, resulting in reduced I/O, higher compression ratios, and suitability for analytical workloads.
Columnar Storage : Data is stored per column, enabling smaller I/O and better compression, as illustrated by the accompanying diagram.
Speed : ClickHouse achieves very fast query performance by combining columnar storage, efficient compression, and a vectorized execution engine that fully utilizes CPU resources. It can process billions of rows per second on a single server and also provides high‑throughput writes, making it suitable for massive data updates.
Performance benchmarks from the community demonstrate its superiority in single‑table queries compared to other engines, though multi‑table joins may perform less well.
Rich Functionality
Beyond speed, ClickHouse supports most SQL syntax (with some limitations), real‑time data updates, and excellent scalability—from single‑node deployments to distributed clusters with hundreds or thousands of nodes, each capable of storing trillions of rows or over 100 TB of data.
Additional features include primary‑key indexes, sparse indexes, data sharding, partitioning, TTL, and master‑slave replication.
Application Scenarios and Constraints
Typical use cases include read‑heavy workloads, bulk updates of more than 1,000 rows, append‑only data ingestion, queries that retrieve many rows but only a few columns, very wide tables, low query frequency, sub‑50 ms latency for simple queries, small‑value columns, and scenarios requiring high per‑query throughput without transactional guarantees.
1. Most requests are read‑only 2. Data is updated in large batches rather than single rows 3. Data is primarily appended, not modified 4. Queries read many rows but only a few columns 5. Tables are wide (many columns) 6. Query frequency is relatively low 7. Simple queries tolerate ~50 ms latency 8. Column values are small numbers or short strings 9. High throughput per query (up to billions of rows per second) 10. No need for transactions 11. Low consistency requirements 12. Queries typically involve a single large table plus small auxiliary tables 13. Result set is much smaller than the source (due to filtering/aggregation)
Corresponding limitations of ClickHouse include lack of true delete/update support, no built‑in transactions (future versions may add them), no secondary indexes, limited SQL (especially joins), no window functions, and manual metadata management.
References
Official documentation: https://clickhouse.tech/docs/en/
Additional articles: https://zhuanlan.zhihu.com/p/98135840 , https://zhuanlan.zhihu.com/p/22165241 , https://zhuanlan.zhihu.com/p/71014268
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.