Optimizing ClickHouse for Log Storage: Cluster Sizing, Table Design, and Performance Tuning
This article summarizes practical experiences with ClickHouse log storage, covering how to size and tune clusters, key table schema design considerations, partitioning strategies, index choices, compression algorithms, and provides a demo CREATE TABLE script for production use.
1. Cluster Scale and Tuning
The migration from Elasticsearch to ClickHouse for log storage prompted a review of cluster sizing, hardware configuration, and performance tuning. A typical deployment uses a hot‑cold separation architecture: each node mounts a 2 TB SSD hot disk and a 5 TB HDD cold disk, with SSDs for hot data and HDDs for cold data. The author’s production cluster consists of 20 nodes.
Four logical clusters were created to isolate different business domains:
Cluster 1 – logs for the storage support domain.
Cluster 2 – non‑transactional business logs.
Cluster 3 – transaction‑related logs.
Cluster 4 – algorithm recommendation logs.
Benefits of this segmentation include targeted governance and the ability to reroute traffic when a cluster experiences issues.
Three hardware configurations were tested:
Configuration 1: 16 CPU / 64 GB RAM. Query latency was unacceptable (30 s for exact lookup, 60 s for fuzzy 5‑hour search), far exceeding the desired sub‑5‑second production threshold.
Configuration 2: 48 CPU / 192 GB RAM. Exact lookups returned in a few hundred milliseconds, Bloom‑filter searches in seconds, and fuzzy 5‑hour searches under 3 s, meeting most business needs but still struggling with high‑throughput recommendation workloads.
Configuration 3: 128 CPU / 256 GB RAM with upgraded SSD tier (PL3). After moving to larger disks, fuzzy 5‑hour queries mostly returned within 3–5 s, 1‑hour queries within 2 s, and exact lookups around 1 s, satisfying production requirements.
Key takeaway: balance cluster count and size to enable flexible tuning without over‑fragmentation.
2. Table Structure Design
Effective schema design dramatically improves query performance. The author iterated over table definitions more than five times, focusing on index creation and partitioning.
2.1 Indexing key fields
Commonly used identifiers are extracted into separate columns and indexed. Example for a trace identifier:
`tracing_id` String,
INDEX tracing_id tracing_id TYPE SET(100) GRANULARITY 2,
INDEX idx_tracing_id tracing_id TYPE tokenbf_v1(512, 2, 0) GRANULARITY 2Example for a timestamp field:
`record_time` DateTime64(3),
INDEX record_time record_time TYPE minmax GRANULARITY 2ClickHouse’s MergeTree engine supports four secondary index types: minmax, set, ngrambf_v1, and tokenbf_v1. Using appropriate indexes on frequently filtered columns can cut query latency dramatically.
2.2 Choosing partition keys
Partitioning directly impacts I/O and query speed. Three strategies were evaluated:
Application‑by‑day: each application gets a daily partition. This creates hundreds of partitions per day, causing high write IOPS (30‑50% of total IOPS) and degrading query performance.
Global daily partition: all applications share a single daily partition. This reduces IOPS pressure and simplifies cost accounting.
Hourly partition: used for high‑volume recommendation logs. While it isolates write spikes, migrating partitions at night can saturate the cluster’s IOPS, so careful scheduling is required.
Examples:
PARTITION BY (toDate(log_time), log_save_time) PARTITION BY (toStartOfHour(log_time), log_save_time)2.3 MergeTree engine and order‑by fields
The author chose the standard MergeTree engine (not ReplacingMergeTree) to avoid extra merge overhead. Ordering columns should align with common query predicates to leverage primary‑key indexing:
ORDER BY (application, environment, log_time, ip, file_offset)2.4 Compression algorithm selection
Two codecs were compared: LZ4 (faster queries, ~30% quicker, but 2.5× larger storage) and ZSTD (higher compression ratio 1:10 vs 1:4, slower queries). For most workloads the author prefers ZSTD for its storage efficiency.
`message` String CODEC(ZSTD),
INDEX idx_message message TYPE tokenbf_v1(512, 2, 0) GRANULARITY 2,3. Additional Design Points
A simplified demo table creation script illustrates the recommended schema, indexes, partitioning, ordering, and settings for a production ClickHouse log store:
CREATE TABLE demo_log_local ON CLUSTER default (
`app` String,
`keyArr` Array(Nullable(String)),
`valArr` Array(Nullable(String)),
`msgBody` String CODEC(ZSTD),
`record_time` DateTime64(3),
`tracing_id` String,
`storage_time` Int32,
`msg_size` Int64,
INDEX msg_size msg_size TYPE SET(100) GRANULARITY 2,
INDEX app app TYPE SET(100) GRANULARITY 2,
INDEX msgBody msgBody TYPE tokenbf_v1(32768, 2, 0) GRANULARITY 2,
INDEX idx_msgBody msgBody TYPE tokenbf_v1(512, 2, 0) GRANULARITY 2,
INDEX tracing_id idx_tracing_id TYPE SET(100) GRANULARITY 2,
INDEX idx_tracing_id idx_tracing_id TYPE tokenbf_v1(512, 2, 0) GRANULARITY 2,
INDEX record_time record_time TYPE minmax GRANULARITY 2
)
ENGINE = MergeTree
PARTITION BY (toDate(record_time), storage_time)
ORDER BY (application, record_time)
SETTINGS allow_nullable_key = 1, storage_policy = 'ttl', index_granularity = 8192;When ingesting logs via Flink, reserve about 20% headroom to avoid bottlenecks during traffic spikes. ClickHouse also offers built‑in monitoring, alerting, and visual query consoles.
Overall, systematic cluster sizing, thoughtful schema design, and appropriate compression choices enable ClickHouse to serve as a reliable, cost‑effective log storage backend ready for production deployment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
