Databases 21 min read

Design and Implementation of ClickHouse‑Based Log Storage with Hot/Cold/Archive Tiering

To replace Elasticsearch, the team built a ClickHouse‑based log platform that uses table‑level TTL and custom storage policies to automatically move data among hot (ESSD PL1), cold (ESSD PL0) and archive (OSS) disks, achieving faster writes, comparable query latency, and over 50 % cost reduction.

DeWu Technology
DeWu Technology
DeWu Technology
Design and Implementation of ClickHouse‑Based Log Storage with Hot/Cold/Archive Tiering

The existing log platform at DeWu relied on Elasticsearch (ES) for storage. Rapid business growth introduced new requirements: increasing number of applications, higher log volume for debugging, and longer retention for compliance. ClickHouse, a column‑oriented OLAP DBMS, was evaluated as a replacement due to its superior write performance and compression.

ClickHouse stores data column‑wise, offering 100× faster query speeds for analytical workloads. The article briefly contrasts row‑oriented and column‑oriented storage with illustrative images.

ClickHouse supports table‑level TTL policies, which can automatically merge and delete or move data to different disks. The team initially attempted to use table‑level TTL but encountered limitations, so they implemented a custom multi‑level storage strategy using a storage policy.

<path>/data1/ClickHouse/data/</path> --为了便于查找,我们建议在默认的存储路径下方添加存储策略 <storage_configuration> <disks> <hot> <path>/data1/ClickHouse/hot/</path> --这里注意,使用存储策略后,建议任何数据路径之间不要有子集关系 </hot> <cold> <path>/data2/ClickHouse/cold/</path> </cold> </disks> <policies> <ttl> <volumes> <hot> <disk>hot</disk> </hot> <cold> <disk>cold</disk> </cold> </volumes> </ttl> </policies> </storage_configuration>

A sample table creation statement demonstrates the schema used for log ingestion, including fields for application, environment, IP, message, timestamps, and two integer columns log_save_time and oss_save_time that drive the hot/cold/archive tiering.

CREATE TABLE db_rdsauditlog_local ON CLUSTER auditlog ( `check_rows` Int64, `client_ip` String, `db` String, `fail` Int64, `instance_id` String, `latency` Int64, `origin_time` DateTime('Asia/Shanghai'), `return_rows` Int64, `sql` String, `thread_id` Int64, `update_rows` Int64, `user` String, `tables` Array(String), `sqlhash` String, `sqlfingerprint` String, `sqltype` String, INDEX instance_sqltype (instance_id, sqltype) TYPE set(100) GRANULARITY 5, INDEX origintime_sqlhash (instance_id,sqlhash) TYPE set(100) GRANULARITY 5, INDEX origintime_instance_clientip(instance_id, client_ip) TYPE set(100) GRANULARITY 5 ) ENGINE = MergeTree PARTITION BY toYYYYMMDD(origin_time) ORDER BY origin_time TTL origin_time + INTERVAL 6 hour TO DISK 'cold' SETTINGS storage_policy = 'ttl',index_granularity = 8192;

To satisfy diverse retention policies (e.g., 7 d, 30 d, 90 d), the team abandoned per‑application partitioning because it caused excessive partition counts and write slowdown. Instead, they adopted a three‑field partition key (toDate(log_time), log_save_time, oss_save_time) . The log_save_time and oss_save_time columns store the number of days a row should stay on hot and cold disks respectively. A daily scheduled task queries system.parts and moves partitions to the appropriate disk based on these values.

alter table dw_log.tb_logs_local on cluster default MOVE PARTITION XXX to disk 'cold'

Several storage‑medium options were evaluated. JuiceFS + OSS was tested for its ability to mount object storage as a POSIX filesystem, but concerns about metadata consistency, performance of Redis/MySQL/TiKV back‑ends, and operational complexity led to its rejection.

The final solution uses native ClickHouse storage policies with three disks: hot (ESSD PL1), cold (ESSD PL0), and arch (OSS). The arch disk is defined as an S3‑compatible object store, enabling cheap archival storage while still supporting ClickHouse’s MOVE PARTITION … TO DISK commands.

<storage_configuration> <disks> <hot> <path>/data1/ClickHouse/hot/data/</path> <move_factor>0.1</move_factor> </hot> <cold> <path>/data2/ClickHouse/cold/data/</path> </cold> <arch> <type>s3</type> <endpoint>http://log-sh.oss-cn-xx-internal.xxx.com/xxxxxxxxxx/</endpoint> <access_key_id>xxxxxxxx</access_key_id> <secret_access_key>xxxxxx</secret_access_key> <metadata_path>/data1/ClickHouse/disks/s3/</metadata_path> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <cache_path>/data1/ClickHouse/disks/s3/cache/</cache_path> </arch> </disks> <policies> <ttl> <volumes> <hot><disk>hot</disk></hot> <cold><disk>cold</disk></cold> <arch><disk>arch</disk></arch> </volumes> </ttl> </policies> </storage_configuration>

Performance tests showed that OSS write throughput can reach 7 GB/s and that the query latency on OSS‑backed disks is comparable to local ESSD PL0 disks. The architecture leverages cloud‑native features: single‑replica ESSD disks with automatic failover, vertical scaling of disk capacity, ECS node scaling, and low‑cost OSS for archival data, achieving over 50 % cost reduction.

The article concludes that the DBA team’s contributions—field indexing recommendations, TTL strategy design, SQL optimization, and cost‑saving proposals—enabled a successful migration from ES to ClickHouse, delivering higher write performance and significant storage savings.

PerformanceDatabaseClickHouseHot/Cold TieringLog StorageTTL
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.