Building a Cost‑Effective Data Analysis Platform: ClickHouse vs Elasticsearch and Deployment Guide for Zookeeper, Kafka, Filebeat, and ClickHouse
This article compares Elasticsearch and ClickHouse for log analytics, presents cost‑benefit calculations, and provides a step‑by‑step deployment guide for Zookeeper, Kafka, Filebeat, and ClickHouse to build a scalable, low‑cost data analysis platform for SaaS services.
Background
SaaS services will face data security and compliance challenges in the future. Our business needs a private‑deployment capability to improve industry competitiveness. To enhance platform capabilities we need a data system for operational analysis, but a full‑blown big‑data stack would impose heavy server costs, so we chose a balanced solution.
Elasticsearch vs ClickHouse
ClickHouse is a high‑performance column‑oriented distributed DBMS. Our tests revealed the following advantages over Elasticsearch:
Write throughput: a single server can ingest 50‑200 MB/s (over 600 k records/s), more than 5× the throughput of Elasticsearch, with far fewer write rejections and latency spikes.
Query speed: ClickHouse can achieve 2‑30 GB/s when data resides in page cache, and 5‑30× faster than Elasticsearch when reading from disk, depending on compression.
Server cost: ClickHouse’s higher compression (1/3‑1/30 of Elasticsearch) reduces disk usage and I/O, while its lower memory and CPU consumption can cut server costs by roughly 50%.
Cost Analysis
Cost estimates are based on Alibaba Cloud pricing without any discounts.
Environment Deployment
Zookeeper Cluster Deployment
Install Java and configure environment variables.
yum install java-1.8.0-openjdk-devel.x86_64
# /etc/profile configure environment variablesSynchronize system time.
yum install ntpdate
ntpdate asia.pool.ntp.org
mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATHEnter the configuration directory and create zoo.cfg:
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888Create a myid file on each node:
echo "1" > /usr/zookeeper/data/myid
# on the second node
echo "2" > /usr/zookeeper/data/myid
# on the third node
echo "3" > /usr/zookeeper/data/myidStart Zookeeper:
cd $ZOOKEEPER_HOME/bin
sh zkServer.sh startKafka Cluster Deployment
mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafkaConfigure each broker (example for broker.id=1):
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0Run Kafka as a background daemon:
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
$KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
$KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_dataFileBeat Deployment
sudo rpm --import https://packages.elastic.co/GPK-KEY-elasticsearch
# Create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
yum install filebeat
systemctl enable filebeat
chkconfig --add filebeatKey FileBeat configuration (ensure keys_under_root: true is set so Kafka fields are not nested under message).
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/logs/xxx/inner/*.log
json:
keys_under_root: true
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
topic: 'xxx_data_clickhouse'
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
processors:
- drop_fields:
fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]
ignore_missing: false
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &ClickHouse Deployment
Check CPU for SSE 4.2 support:
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"Create a data directory on a high‑capacity disk:
mkdir -p /data/clickhouseAdd ClickHouse host entries to /etc/hosts:
10.190.85.92 bigdata-clickhouse-01
10.190.85.93 bigdata-clickhouse-02Optimize server performance:
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | tee /proc/sys/vm/overcommit_memory
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabledInstall ClickHouse from the official repository:
yum install yum-utils
rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
yum list | grep clickhouse
yum -y install clickhouse-server clickhouse-clientSet log level to information in /etc/clickhouse-server/config.xml:
<level>information</level>Log locations:
Normal log: /var/log/clickhouse-server/clickhouse-server.log Error log: /var/log/clickhouse-server/clickhouse-server.err.log Verify ClickHouse version and manage the service:
clickhouse-server --version
clickhouse-client --password
sudo clickhouse stop
sudo clickhouse startConclusion
The deployment process involved many pitfalls, especially the FileBeat yml parameters. I will publish a follow‑up article detailing ClickHouse configuration issues. Beyond the technical work, continuous learning and output remain essential for building a personal moat, whether as a technical expert, architect, or manager.
If your company lacks strong industry influence, staying on the front line and later seeking new opportunities can be a pragmatic path; consider industry impact, commercial sense, and architectural skills when planning your career.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
