Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide
This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a detailed step‑by‑step deployment guide for Zookeeper, Kafka, FileBeat, and ClickHouse, including common issues and their solutions.
In order to build a private‑deployment data analysis capability for SaaS services, the author evaluates two storage engines—Elasticsearch and ClickHouse—focusing on write throughput, query speed, and server cost.
Elasticsearch vs ClickHouse
ClickHouse offers significantly higher write throughput (50‑200 MB/s per server, over 600 k records/s, >5× ES) and fewer write rejections. Query speed is 5‑30× faster than ES, especially when data resides in pagecache. ClickHouse also reduces storage by 1/3‑1/30 compared to ES, lowering disk I/O and CPU usage, which can halve server costs.
Cost Analysis
A cost comparison based on Alibaba Cloud pricing (without discounts) shows ClickHouse’s lower resource consumption.
Environment Deployment
1. Zookeeper Cluster Deployment
yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile 配置环境变量
yum install ntpdate
ntpdate asia.pool.ntp.org
mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
echo "1" > /usr/zookeeper/data/myid
echo "2" > /usr/zookeeper/data/myid
echo "3" > /usr/zookeeper/data/myid
cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start2. Kafka Cluster Deployment
mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka
# broker.id, listeners, and other configs are set in server.properties
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties >/usr/kafka/logs/kafka.log 2>&1 &
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_data3. FileBeat Deployment
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# create /etc/yum.repos.d/elastic.repo with the following content
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat
# filebeat.yml (important: set keys_under_root: true)
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/logs/xxx/inner/*.log
json:
keys_under_root: true
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
topic: 'xxx_data_clickhouse'
compression: gzip
processors:
- drop_fields:
fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &4. ClickHouse Deployment
# Verify SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
mkdir -p /data/clickhouse
# Add host entries for clickhouse nodes
# Example:
# 10.190.85.92 bigdata-clickhouse-01
# 10.190.85.93 bigdata-clickhouse-02
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | tee /proc/sys/vm/overcommit_memory
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled
yum install yum-utils
rpm --import <https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG>
yum-config-manager --add-repo <https://repo.clickhouse.tech/rpm/stable/x86_64>
yum -y install clickhouse-server clickhouse-client
# Modify /etc/clickhouse-server/config.xml to set <level>information</level>
# Logs:
# /var/log/clickhouse-server/clickhouse-server.log
# /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse-server --version
clickhouse-client --password
sudo clickhouse stop
sudo clickhouse startDuring deployment, several issues were encountered and solved:
Kafka engine table direct select not allowed : start client with --stream_like_engine_allow_direct_select 1.
Local replicated table macro missing : configure distinct <shard> and <replica> values in each node’s macros section.
Replica already exists : delete the problematic Zookeeper node and recreate the table.
Distributed table authentication failure : set correct user and password in <remote_servers> configuration.
Finally, a materialized view was created to sync Kafka‑consumed data into the distributed ClickHouse table, completing the data pipeline.
Conclusion: By following official documentation and troubleshooting step‑by‑step, the full log data flow from Kafka through ClickHouse was achieved.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
