How We Replaced Elasticsearch with ClickHouse for High‑Performance Log Storage
Facing rapid growth, our team evaluated ClickHouse’s hot‑cold storage and tiered‑disk policies to replace Elasticsearch, designing partitioning, TTL, and multi‑level storage strategies—including hot, cold, and archive disks, custom storage policies, and OSS integration—to achieve higher write throughput, better compression, and over 50% cost reduction.
2.1 ClickHouse Overview
ClickHouse is a column‑oriented DBMS designed for OLAP workloads, offering query speeds up to 100× faster than row‑oriented databases and high compression ratios (lz4 ≈ 1:4, zstd ≈ 1:10). The article focuses on practical usage rather than basic features.
2.2 ClickHouse Storage Strategy
ClickHouse supports table‑level TTL policies that can automatically merge and delete data or move it to other disks/volumes. The initial plan to use table‑level TTL was abandoned due to a bug; instead, a scheduled task moves table parts according to custom TTL logic.
Storage configuration example
<path>/data1/ClickHouse/data/</path>
<storage_configuration>
<disks>
<hot>
<path>/data1/ClickHouse/hot/</path>
</hot>
<cold>
<path>/data2/ClickHouse/cold/</path>
</cold>
</disks>
<policies>
<ttl>
<volumes>
<hot><disk>hot</disk></hot>
<cold><disk>cold</disk></cold>
</volumes>
</ttl>
</policies>
</storage_configuration>Key tags explained: <path> (default storage path), <storage_configuration> (defines policies), <disks>, <hot>, <cold>, <policies>, <ttl>, <volumes>.
2.3 Partition Strategy
The table uses a partition key
PARTITION BY (application, environment, toYYYYMMDD(log_time))in the first design, but this caused excessive partitions per insert block (default max_partitions_per_insert_block=100) leading to write‑performance degradation and errors such as “too many partitions for single insert block”. Adjusting the parameter to very high values caused “too many parts” errors because ClickHouse could not merge fragments quickly enough.
Consequently, the design switched to
PARTITION BY (toDate(log_time), log_save_time, oss_save_time), moving the application name to the ORDER BY clause. Two additional integer columns, log_save_time and oss_save_time, store the retention period for hot and cold storage respectively. A daily task queries system.parts and moves partitions to the appropriate disk based on these fields.
3 Business Requirements
Support day‑level retention policies for each business domain.
Store data on different storage media according to date.
Choose storage media that minimizes cost while meeting multi‑level retention.
The DBA team proposed:
Use table partitioning to achieve day‑level retention.
Adopt a three‑tier storage policy: Hot + Cold + Archive.
Hot: ESSD PL1, Cold: ESSD PL0, Archive: OSS (object storage).
4 Solution Design
4.1 Meeting Day‑Level Retention
Option 1: Partition by (application, environment, toYYYYMMDD(log_time)). This met the requirement but caused severe write‑performance issues due to the massive number of partitions per insert block.
Option 2 (chosen): Partition by (toDate(log_time), log_save_time, oss_save_time). The application column is kept in the ORDER BY clause for query speed. Two extra columns control when a partition is moved from hot to cold and from cold to archive.
4.2 Storing Data on Different Media
Two approaches were evaluated:
Table‑level TTL: Directly set TTL in the CREATE statement to move data to a cold disk. This was rejected because modifying TTL triggers a full reload of all parts, saturating I/O.
Scheduled task (chosen): A custom job runs ALTER TABLE … MOVE PARTITION … TO DISK 'cold' based on the log_save_time and oss_save_time fields.
Example move command:
alter table dw_log.tb_logs_local on cluster default MOVE PARTITION XXX to disk 'cold'4.3 Archival Storage Media
Two options were considered for the archive tier:
ClickHouse + JuiceFS + OSS: JuiceFS provides a POSIX‑compatible layer over object storage. Although performance was acceptable, the architecture introduced third‑party metadata stores (Redis, MySQL, TiKV) that added operational risk and complexity.
ClickHouse + OSS (chosen): ClickHouse’s MergeTree engine can mount an S3‑compatible OSS bucket directly. Configuration example:
<storage_configuration>
<disks>
<hot>…</hot>
<cold>…</cold>
<arch>
<type>s3</type>
<endpoint>http://log-sh.oss-cn-xx-internal.xxx.com/xxxxxxxxxx/</endpoint>
<access_key_id>xxxxxxxx</access_key_id>
<secret_access_key>xxxxxx</secret_access_key>
<metadata_path>/data1/ClickHouse/disks/s3/</metadata_path>
<cache_enabled>true</cache_enabled>
<data_cache_enabled>true</data_cache_enabled>
<cache_path>/data1/ClickHouse/disks/s3/cache/</cache_path>
</arch>
</disks>
<policies>
<ttl>
<volumes>
<hot><disk>hot</disk></hot>
<cold><disk>cold</disk></cold>
<arch><disk>arch</disk></arch>
</volumes>
</ttl>
</policies>
</storage_configuration>Tests showed OSS write speed up to 7 GB/s and fast cluster restarts. Using OSS for the archive tier reduced storage cost by about 66% compared with ESSD PL0.
5 Storage Architecture
The final architecture combines:
ESSD hot and cold disks for high‑performance, single‑replica storage (cloud disks already provide multi‑replica safety).
Vertical ESSD expansion without service interruption.
ECS node scaling for compute capacity.
OSS object storage for low‑frequency archival data, billed on usage.
6 Conclusion
The DBA team contributed field‑level index recommendations, TTL policy design, SQL optimization, and cost‑saving measures. By migrating the log platform from Elasticsearch to ClickHouse, they achieved significantly higher write performance and reduced storage expenses by more than 50%.
References
https://ClickHouse.com/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key/
https://www.juicefs.com/docs/zh/community/cache_management
https://www.juicefs.com/docs/zh/community/architecture
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
