Mastering ClickHouse: Practical Table Engine Choices and Cluster Best Practices
This guide explains ClickHouse’s core concepts, application scenarios, table engine families, detailed engine configurations, SQL development standards, cluster architecture, ZooKeeper’s role, chproxy usage, client tool options, availability considerations, and performance‑tuning parameters for high‑throughput OLAP workloads.
Why ClickHouse
ClickHouse is an open‑source columnar DBMS optimized for OLAP workloads. It provides vectorized execution, high throughput, and a rich set of features, making it suitable for large‑scale analytics such as user‑behavior analysis, real‑time experiment monitoring, and log processing.
Typical Workload Characteristics
Read‑heavy queries on wide tables; concurrency around 100‑200 QPS per server.
Data is ingested in batches of 5 w‑10 w rows; updates and transactions are not required.
Simple queries target ~50 ms latency; each server must handle billions of rows per second during background merges.
Table Engine Families
ClickHouse offers four engine families: Log , MergeTree , Integration , and Special . The MergeTree family is the most widely used and can be combined with materialized views and distributed tables.
MergeTree Engine
Data is stored in sorted parts that are merged in the background. It supports partitioning, replication, sampling, and TTL.
CREATE TABLE [IF NOT EXISTS] db.table_name (
name1 type1 [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
name2 type2 [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
...,
INDEX index_name expr TYPE type GRANULARITY value,
...
) ENGINE = MergeTree()
ORDER BY expr
[PARTITION BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...];ORDER BY – mandatory sorting key; if omitted, it becomes the primary key.
PARTITION BY – recommended for large tables to limit data scans.
PRIMARY KEY – optional when different from the sorting key.
SAMPLE BY – optional sampling column.
TTL – data expiration, strongly recommended for big tables.
ReplicatedMergeTree
Provides high availability via ZooKeeper‑based replication. Not ideal for very large tables because ZooKeeper can become a bottleneck.
CREATE TABLE [IF NOT EXISTS] db.table_name (
`id` Int64,
`ymd` Int64
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/replicated/{shard}/test', '{replica}')
PARTITION BY ymd
ORDER BY id;The ZooKeeper path /clickhouse/tables/replicated/… contains {shard} (shard identifier) and {replica} (unique host name).
ReplacingMergeTree
Deduplicates rows within a partition based on the ORDER BY key. An optional version column [ver] determines which row is kept.
CREATE TABLE db.table_name (
...
) ENGINE = ReplacingMergeTree([ver])
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SETTINGS name=value, ...];Deduplication occurs only inside the same partition.
SummingMergeTree
During merges, numeric columns with identical primary keys are summed, similar to a GROUP BY operation.
CREATE TABLE db.table_name (
...
) ENGINE = SummingMergeTree([columns])
[PARTITION BY expr]
[ORDER BY expr]
[SETTINGS name=value, ...];The primary key must be a prefix of the sorting key.
AggregatingMergeTree
Stores pre‑computed aggregate states using user‑defined aggregate functions, offering more flexibility than SummingMergeTree.
CREATE TABLE db.table_name (
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SETTINGS name=value, ...];Special Engines
Distributed – logical table that routes queries to underlying shards; does not store data.
Dictionary – key‑value lookup tables.
Distributed(cluster_name, database_name, table_name[, sharding_key])Development Guidelines
SQL Writing Rules
Prefer IN over JOIN when only one side’s data is needed.
When joining, keep the smaller table on the right side.
Use dictionary tables for small reference data.
Avoid SELECT *; selecting fewer columns reduces data scanned.
Apply LIMIT to restrict result size.
Include the partition key in predicates to enable partition pruning.
Do not treat ClickHouse as a row‑store RDBMS; point queries are slower.
Use GLOBAL for distributed sub‑queries to prevent exponential growth.
Prefer uniqCombined over distinct for approximate deduplication.
Prefer numeric types over strings for better performance.
Keep the number of distinct partition values reasonable.
Avoid grouping on high‑cardinality columns.
Monitor CPU usage; >70 % may cause query timeouts.
Data Ingestion Tips
Batch inserts of 5 w‑10 w rows; avoid single‑row inserts.
ClickHouse does not support UPDATE / DELETE or transactions.
Always define a partition key for large tables.
Too many partitions increase file‑descriptor usage and degrade performance.
Group data by partition before inserting.
If primary keys may contain duplicates, use ReplacingMergeTree for deduplication.
Table Naming Conventions
Local tables end with _local; distributed tables end with _shard.
Materialized view names end with _mv.
Avoid Nullable columns when possible.
Use date‑based primary keys and partitions.
Specify TTL for tables that do not need to retain full history.
For replicated tables, set use_minimalistic_part_header_in_zookeeper=1.
CREATE TABLE IF NOT EXISTS ads.ads_af_city_complaint_1d_local ON CLUSTER ycdata_3shards_3replicas (
`id` UInt64 COMMENT '序号',
`order_id` UInt64 COMMENT '订单号',
`gross_weight` UInt64 COMMENT '权重',
`create_time` Date COMMENT '创建时间',
`event` String COMMENT '事件'
) ENGINE = ReplicatedMergeTree('/clickhouse/table/{shared}/ads_af_city_complaint_1d_local', '{replica}')
PARTITION BY create_time
ORDER BY id
TTL create_time + toIntervalDay(90)
SETTINGS index_granularity=8192, use_minimalistic_part_header_in_zookeeper=1;Cluster Architecture
A common production layout uses 2‑shard × 2‑replica clusters (4 machines). For tables >100 billion rows, a 4‑shard × 0‑replica design is recommended.
Write to *_local tables; read from *_shard tables.
Keep batch sizes consistent to maintain even data distribution.
Open a new connection for each batch; avoid long‑lived connections.
Cross‑cluster access is not supported.
ZooKeeper Role
ZooKeeper coordinates distributed DDL and synchronizes ReplicatedMergeTree replicas. It can become a bottleneck; a typical ZooKeeper node spec is 3 × 32 GB / 4 CPU machines with 10 GbE and 80‑200 GB disks. Set use_minimalistic_part_header_in_zookeeper=1 on replicated tables to reduce ZooKeeper payload.
chproxy
chproxy is a Go‑based HTTP proxy and load balancer for ClickHouse. It provides request routing, response caching, rate limiting, and automatic SSL renewal.
echo 'show databases;' | curl 'http://10.200.161.49:9009/?user=writeuser&password=xxxx' --data-binary @-Repository: https://github.com/ContentSquare/chproxy
Client Tools
DBeaver – free, open‑source universal database client.
Superset – modern BI web application for dashboards.
Tabi – Superset‑like tool; releases at https://github.com/smi2/tabix.ui/releases
Availability Considerations
High availability relies on replication. In a multi‑shard × multi‑replica cluster, a node failure is masked by other replicas. Without replication, a single node loss affects its shard. ZooKeeper outages halt writes for all configurations.
ClickHouse is intended for secondary analytical data; critical data should have fallback and circuit‑breaker mechanisms.
Performance‑Tuning Parameters
max_concurrent_queries – default 100, recommended 150.
max_bytes_before_external_sort – spill to disk when memory is insufficient for ORDER BY.
background_pool_size – default 16, recommended 32 to speed up merges.
max_memory_usage / max_memory_usage_for_all_queries – set close to physical RAM limits.
max_bytes_before_external_group_by – spill to disk during GROUP BY; typically half of max_memory_usage.
Properly configuring these limits improves stability and prevents out‑of‑memory errors.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
