Big Data 15 min read

ClickHouse vs Elasticsearch: Faster, Cheaper Log Analytics Explained

This article compares ClickHouse and Elasticsearch for log analytics, highlighting ClickHouse's superior write throughput, query speed, and lower server costs, then provides a detailed, cost‑effective deployment guide covering Zookeeper, Kafka, FileBeat, ClickHouse installation, and visualization with ClickVisual, plus optimization tips.

MaGe Linux Operations

Jan 3, 2024

ClickHouse vs Elasticsearch: Faster, Cheaper Log Analytics Explained

Background

SaaS services will face data security and compliance issues in the future. Our business needs a private deployment capability to enhance industry competitiveness. To improve platform system capabilities, we need a data system for operational analysis and capability enhancement. Deploying a full big‑data stack would be costly, so we chose a compromise solution to improve data analysis.

1. Elasticsearch vs ClickHouse

ClickHouse is a high‑performance columnar distributed DBMS. Our tests show the following advantages:

Write throughput is high: a single server can write 50‑200 MB/s, over 600 k records per second, more than five times Elasticsearch. Write rejections and latency issues common in ES are rare in ClickHouse.

Query speed is fast; with data in pagecache, a single server can query at 2‑30 GB/s. Even without pagecache, performance depends on disk speed and compression. ClickHouse queries are 5‑30× faster than ES.

Lower server cost: ClickHouse compresses data 1/3‑1/30 of ES, saving disk space and I/O. It also uses less memory and CPU, potentially halving server costs for log processing.

2. Cost Analysis

Note: Cost analysis based on Alibaba Cloud without any discounts.

3. Environment Deployment

a) Zookeeper Cluster Deployment

yum install java-1.8.0-openjdk-devel.x86_64
# Configure environment variables in /etc/profile
yum install ntpdate
ntpdate asia.pool.ntp.org
mkdir -p /usr/zookeeper/data
mkdir -p /usr/zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
# zoo.cfg content
# tickTime=2000
# initLimit=10
# syncLimit=5
# dataDir=/usr/zookeeper/data
# dataLogDir=/usr/zookeeper/logs
# clientPort=2181
# server.1=zk1:2888:3888
# server.2=zk2:2888:3888
# server.3=zk3:2888:3888
# Create myid on each server
echo "1" > /usr/zookeeper/data/myid
# repeat for servers 2 and 3 with respective IDs
cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start

b) Kafka Cluster Deployment

mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka
# Example broker configuration
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0
# Start Kafka as a background process
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &
# Stop Kafka
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
# Topic management examples
$KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
$KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_data

c) FileBeat Deployment

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# Create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat
# filebeat.yml excerpt
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /root/logs/xxx/inner/*.log
  json:
    keys_under_root: true
output.kafka:
  hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
  topic: 'xxx_data_clickhouse'
  compression: gzip
processors:
  - drop_fields:
      fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]
      ignore_missing: false
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &

d) ClickHouse Deployment

# Check SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
# Create data directory
mkdir -p /data/clickhouse
# Add host entries for clickhouse nodes
# e.g., 10.190.85.92 bigdata-clickhouse-01
# e.g., 10.190.85.93 bigdata-clickhouse-02
# Optimize CPU governor
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable overcommit
echo 0 | tee /proc/sys/vm/overcommit_memory
# Disable transparent huge pages
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled
# Install ClickHouse repository
yum install yum-utils
rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
yum list | grep clickhouse
yum -y install clickhouse-server clickhouse-client
# Adjust logging level in /etc/clickhouse-server/config.xml
# <level>information</level>
# Check logs at /var/log/clickhouse-server/
clickhouse-server --version
clickhouse-client --password
# Service control
sudo clickhouse stop
sudo clickhouse start

e) Visualization with ClickVisual

ClickVisual is a lightweight open‑source log query, analysis, and visualization platform that supports ClickHouse, offering a Kibana‑like experience for business log queries.

Visual query panels with hit count histograms and raw logs.

Log index statistics.

Proxy authentication for easy third‑party integration.

Real‑time alerts based on ClickHouse logs.

ClickVisual also provides raw SQL query capability for quick aggregation analysis.

Optimization Methods

Based on our experience, we explored several optimization techniques and encountered some ClickHouse pitfalls, leveraging new features.

1. Log Query Optimization

Thanks to ClickHouse's high compression and query performance, small tables with time‑partitioned searches work well, but large volumes require additional tactics:

TraceID scenario: In SkyWalking, use the tokenbf_v1 index with hasToken to quickly locate relevant parts.

Unstructured logs: Traditional LIKE queries are slow; newer ClickHouse versions support inverted indexes, greatly improving speed.

Aggregation scenarios: Use ClickHouse's Projection feature for common aggregations.

2. Local Table vs Distributed Table

Local tables store data on each node, while distributed tables are logical and do not store data. For high‑frequency log writes, local tables are recommended because:

Writing to distributed tables splits data into parts and distributes via Zookeeper, increasing network load and risking “Too many parts” errors.

Higher risk of data consistency issues.

Increased Zookeeper pressure.

3. ClickHouse Limitation Policies

As log volume grows, we added limits to prevent runaway queries and OOM situations. Parameters in users.xml include:

max_memory_usage
max_memory_usage_for_user
max_memory_usage_for_all_queries
max_rows_to_read
max_result_rows
max_bytes_to_read

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data deployment Elasticsearch ClickHouse Log Analytics

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.