Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide
This article compares Elasticsearch and ClickHouse for log analytics, presents cost analysis, and provides detailed deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse to build a private, high‑performance backend data platform for SaaS services.
Background
SaaS services will face data security and compliance challenges in the future, so the company needs a private‑deployment capability to enhance industry competitiveness. To improve platform capabilities, a data system is required for operational analysis and activity‑effectiveness measurement, but a full‑scale big‑data stack would impose heavy server costs, prompting a balanced solution.
Elasticsearch vs ClickHouse
ClickHouse is a high‑performance column‑oriented distributed DBMS. Tests show the following advantages over Elasticsearch:
Write throughput is 5‑6 times higher (50‑200 MB/s, >600 k records/s per server) and avoids write rejections and latency.
Query speed is 5‑30 times faster; data cached in pagecache can reach 2‑30 GB/s per server.
Storage compression is 1/3 to 1/30 of Elasticsearch, reducing disk I/O and memory/CPU usage, potentially cutting server costs by half.
Cost Analysis
Cost estimation (no discounts) based on Alibaba Cloud shows significant savings when using ClickHouse instead of Elasticsearch.
Environment Deployment
Zookeeper Cluster Deployment
yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile 配置环境变量
更新系统时间
yum install ntpdate
ntpdate asia.pool.ntp.org
mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
# create myid on each node
echo "1" > /usr/zookeeper/data/myid
# repeat with 2 and 3 on other nodes
cd $ZOOKEEPER_HOME/bin
sh zkServer.sh startKafka Cluster Deployment
mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka
# broker configuration (different broker.id per node)
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
$KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
$KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_dataFilebeat Deployment
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
yum install filebeat
systemctl enable filebeat
chkconfig --add filebeatFilebeat configuration requires keys_under_root: true; otherwise all Kafka message fields are stored under message. Example /etc/filebeat/filebeat.yml:
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/logs/xxx/inner/*.log
json:
keys_under_root: true
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
topic: 'xxx_data_clickhouse'
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
processors:
- drop_fields:
fields: ["input","agent","ecs","log","metadata","timestamp"]
ignore_missing: false
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &ClickHouse Deployment
# Check SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
mkdir -p /data/clickhouse
# add host entries for clickhouse nodes in /etc/hosts
# e.g., 10.190.85.92 bigdata-clickhouse-01
# 10.190.85.93 bigdata-clickhouse-02
# Set CPU governor to performance
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable overcommit memory restriction
echo 0 | tee /proc/sys/vm/overcommit_memory
# Disable transparent huge pages
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled
# Install ClickHouse repository
yum install yum-utils
rpm --import <https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG>
yum-config-manager --add-repo <https://repo.clickhouse.tech/rpm/stable/x86_64>
yum list | grep clickhouse
yum -y install clickhouse-server clickhouse-client
# Adjust logging level in /etc/clickhouse-server/config.xml
# <level>information</level>
# Log locations
# /var/log/clickhouse-server/clickhouse-server.log (normal)
# /var/log/clickhouse-server/clickhouse-server.err.log (errors)
clickhouse-server --version
clickhouse-client --password
sudo clickhouse stop
sudo clickhouse startSummary
The deployment involved many pitfalls, especially the Filebeat YAML parameters. ClickHouse configuration details will be updated in a follow‑up post. The author reflects on continuous learning and building a personal “moat” through coding, expertise, and architecture, suggesting staying on the front line or moving to new opportunities as career progresses.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
