Databases 28 min read

ClickHouse Overview: Architecture, Performance, Core Concepts, and Enterprise Use Cases

This article provides a comprehensive introduction to ClickHouse, an open‑source column‑oriented OLAP database, covering its high‑performance benchmarks, core architectural components, query processing model, deployment patterns, Java client usage, and real‑world implementations at large enterprises.

Big Data Technology & Architecture

Jan 14, 2021

ClickHouse Overview

ClickHouse is an open‑source, column‑oriented analytical database created by Yandex for OLAP and big‑data workloads. It offers real‑time query processing with a SQL‑like dialect, strong compression, and a vectorized execution engine that makes it suitable for sub‑second analytics.

Why ClickHouse Stands Out

Benchmark results show ClickHouse outperforms many competitors, e.g., 2.6× faster than Vertica, 17× faster than InfiniDB, 27× faster than MonetDB, 126× faster than Hive, and up to 429× faster than MySQL on identical hardware and data volumes.

Vertica: 2.63×

InfiniDB: 17×

MonetDB: 27×

Hive: 126×

MySQL: 429×

Greenplum: 10×

Spark: 1×

Core Concepts and Architecture

ClickHouse uses a distributed, sharded architecture coordinated by ZooKeeper. Key roles include Shards, Nodes (processes), and the ZooKeeper service. Data is stored in Columns and Fields, with DataTypes handling serialization. Operations are performed on Blocks via IBlockInputStream/IBlockOutputStream, and tables are represented by IStorage implementations.

Key Features

Columnar storage with LZ4 compression (≈8:1 ratio)

Vectorized execution using SIMD (SSE4.2)

Full SQL support (GROUP BY, JOIN, IN, etc.)

Multiple table engines (MergeTree, Log, etc.)

Multi‑master architecture, high availability

Online real‑time queries without preprocessing

Java Client Usage

Two JDBC drivers are available:

<dependency>
    <groupId>ru.yandex.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.2.4</version>
</dependency>

<dependency>
    <groupId>com.github.housepower</groupId>
    <artifactId>clickhouse-native-jdbc</artifactId>
    <version>2.5.2</version>
</dependency>

Example code for creating a table and inserting data:

Class.forName("com.github.housepower.jdbc.ClickHouseDriver");
Connection connection = DriverManager.getConnection("jdbc:clickhouse://192.168.60.131:9000");
Statement statement = connection.createStatement();
statement.executeQuery("create table test.example(day Date, name String, age UInt8) Engine=Log");
PreparedStatement pstmt = connection.prepareStatement("insert into test.example values(?, ?, ?)");
for (int i = 0; i < 10; i++) {
    pstmt.setDate(1, new Date(System.currentTimeMillis()));
    pstmt.setString(2, "panda_" + (i + 1));
    pstmt.setInt(3, 18);
    pstmt.addBatch();
}
 pstmt.executeBatch();
ResultSet rs = statement.executeQuery("select * from test.jdbc_example");
while (rs.next()) {
    System.out.println(rs.getDate(1) + ", " + rs.getString(2) + ", " + rs.getInt(3));
}

CLI example to list tables and query data:

ck-master :) show tables;
SHOW TABLES
┌─name─────────┐
│ hits         │
│ jdbc_example │
└──────────────┘
ck-master :) select * from example;
SELECT * FROM jdbc_example
┌────────day─┬─name─────┬─age─┐
│ 2019-04-25 │ panda_1  │  18 │
│ …          │ …        │ …   │
└────────────┴──────────┴─────┘

Enterprise Deployments

Major companies such as Ctrip, Kuaishou, and QQ Music use ClickHouse for large‑scale analytics, handling petabytes of data with thousands of CPU cores. Practices include careful partition design, data sorting before ingestion, left‑table‑right‑table join ordering, monitoring CPU/memory, SSD storage, and read/write separation using temporary nodes.

Common challenges and solutions involve ZooKeeper performance, data consistency on write failures, efficient real‑time and batch ingestion via message queues, limiting partition counts, and localizing cross‑table joins through consistent hashing.

Conclusion

Although ClickHouse is relatively young and has some limitations (e.g., lack of full transaction support), its extreme performance, columnar storage, and flexible architecture make it a compelling foundation for modern OLAP workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL distributed architecture ClickHouse OLAP Columnar Database

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.