Backend Development 12 min read

Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide

This article compares Elasticsearch and ClickHouse for log analytics, presents cost analysis, and provides detailed deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse to build a private, high‑performance backend data platform for SaaS services.

Selected Java Interview Questions

Aug 27, 2022

Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide

Background

SaaS services will face data security and compliance challenges in the future, so the company needs a private‑deployment capability to enhance industry competitiveness. To improve platform capabilities, a data system is required for operational analysis and activity‑effectiveness measurement, but a full‑scale big‑data stack would impose heavy server costs, prompting a balanced solution.

Elasticsearch vs ClickHouse

ClickHouse is a high‑performance column‑oriented distributed DBMS. Tests show the following advantages over Elasticsearch:

Write throughput is 5‑6 times higher (50‑200 MB/s, >600 k records/s per server) and avoids write rejections and latency.

Query speed is 5‑30 times faster; data cached in pagecache can reach 2‑30 GB/s per server.

Storage compression is 1/3 to 1/30 of Elasticsearch, reducing disk I/O and memory/CPU usage, potentially cutting server costs by half.

Cost Analysis

Cost estimation (no discounts) based on Alibaba Cloud shows significant savings when using ClickHouse instead of Elasticsearch.

Environment Deployment

Zookeeper Cluster Deployment

yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile 配置环境变量
更新系统时间
yum install  ntpdate
ntpdate asia.pool.ntp.org

mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs

wget  --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper

export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH

cd $ZOOKEEPER_HOME/conf
vi zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888

# create myid on each node
echo "1" > /usr/zookeeper/data/myid
# repeat with 2 and 3 on other nodes

cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start

Kafka Cluster Deployment

mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget  --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka

# broker configuration (different broker.id per node)
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0

nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &

/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh

$KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
$KAFKA_HOME/bin/kafka-topics.sh  --create --bootstrap-server ip:9092  --replication-factor 2 --partitions 3 --topic xxx_data

Filebeat Deployment

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch

# create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat

Filebeat configuration requires keys_under_root: true; otherwise all Kafka message fields are stored under message. Example /etc/filebeat/filebeat.yml:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /root/logs/xxx/inner/*.log
  json:
    keys_under_root: true
output.kafka:
  hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
  topic: 'xxx_data_clickhouse'
  partition.round_robin:
    reachable_only: false
  required_acks: 1
  compression: gzip
processors:
- drop_fields:
    fields: ["input","agent","ecs","log","metadata","timestamp"]
    ignore_missing: false

nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &

ClickHouse Deployment

# Check SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

mkdir -p /data/clickhouse
# add host entries for clickhouse nodes in /etc/hosts
# e.g., 10.190.85.92 bigdata-clickhouse-01
#       10.190.85.93 bigdata-clickhouse-02

# Set CPU governor to performance
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable overcommit memory restriction
echo 0 | tee /proc/sys/vm/overcommit_memory
# Disable transparent huge pages
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled

# Install ClickHouse repository
yum install yum-utils
rpm --import <https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG>
yum-config-manager --add-repo <https://repo.clickhouse.tech/rpm/stable/x86_64>

yum list | grep clickhouse
yum -y install clickhouse-server clickhouse-client

# Adjust logging level in /etc/clickhouse-server/config.xml
# <level>information</level>

# Log locations
# /var/log/clickhouse-server/clickhouse-server.log (normal)
# /var/log/clickhouse-server/clickhouse-server.err.log (errors)

clickhouse-server --version
clickhouse-client --password

sudo clickhouse stop
sudo clickhouse start

Summary

The deployment involved many pitfalls, especially the Filebeat YAML parameters. ClickHouse configuration details will be updated in a follow‑up post. The author reflects on continuous learning and building a personal “moat” through coding, expertise, and architecture, suggesting staying on the front line or moving to new opportunities as career progresses.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Engineering Elasticsearch Zookeeper Kafka ClickHouse filebeat

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.