Big Data 13 min read

Why ClickHouse Beats Elasticsearch for Log Analytics – Performance, Cost & Deployment

This article compares ClickHouse and Elasticsearch for log analytics, highlighting ClickHouse’s superior write throughput, query speed, and lower server costs, then details a cost‑effective deployment architecture—including Zookeeper, Kafka, FileBeat, and ClickHouse setup—and shares optimization tips and visualization using ClickVisual.

Efficient Ops

Dec 27, 2023

Why ClickHouse Beats Elasticsearch for Log Analytics – Performance, Cost & Deployment

Background

SaaS services face data security and compliance challenges. To improve competitiveness, the company needs a private deployment capability and a data system for operational analysis without incurring large server costs, leading to a compromise solution.

Elasticsearch vs ClickHouse

ClickHouse is a high‑performance columnar distributed DBMS. Tests show:

Write throughput: 50‑200 MB/s per server, over 600 k records/s, >5× Elasticsearch.

Query speed: 2‑30 GB/s from page cache, 5‑30× faster than Elasticsearch.

Lower server cost: higher compression (1/3‑1/30 disk space) and lower memory/CPU usage, potentially halving server costs.

Cost Analysis

Cost estimate based on Alibaba Cloud without discounts.

Environment Deployment

Zookeeper Cluster

yum install java-1.8.0-openjdk-devel.x86_64
# configure /etc/profile, set timezone, create directories, download and extract Zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
# sample zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
# create myid on each node
echo "1" > /usr/zookeeper/data/myid
# start
sh zkServer.sh start

Kafka Cluster

mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka
# broker configuration (example)
broker.id=1
listeners=PLAINTEXT://ip:9092
# other settings …
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties >/usr/kafka/logs/kafka.log 2>&1 &

FileBeat Deployment

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
yum install filebeat
systemctl enable filebeat

FileBeat configuration highlights: set keys_under_root: true and define Kafka output.

ClickHouse Deployment

# check SSE4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
mkdir -p /data/clickhouse
# add hosts entries for clickhouse nodes
# set CPU governor to performance
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# disable overcommit
echo 0 | tee /proc/sys/vm/overcommit_memory
# disable transparent huge pages
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled
# install from official repo
yum install yum-utils
rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
yum -y install clickhouse-server clickhouse-client
# adjust config.xml log level to information

Visualization with ClickVisual

ClickVisual is an open‑source lightweight log query, analysis, and alerting UI that supports ClickHouse as a backend, offering histogram panels, index management, proxy authentication, and real‑time alerts.

Optimization Methods

Log Query Optimization

TraceID scenario: use tokenbf_v1 index with hasToken for fast hits.

Unstructured logs: replace LIKE with inverted index support in newer ClickHouse versions.

Aggregation: leverage ClickHouse Projection feature.

Local vs Distributed Tables

For high‑frequency log writes, prefer local tables to avoid network overhead, part explosion, and Zookeeper pressure.

ClickHouse Limits

Set user‑level limits in users.xml to prevent runaway queries, e.g., max_memory_usage, max_rows_to_read, max_result_rows, max_bytes_to_read.

Conclusion

The deployment involved many pitfalls, especially FileBeat configuration. Future posts will detail additional ClickHouse tuning experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Elasticsearch Kafka clickhouse Log Analytics

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.