Big Data 14 min read

Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide

This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a detailed step‑by‑step deployment guide for Zookeeper, Kafka, FileBeat, and ClickHouse, including common issues and their solutions.

Code Ape Tech Column

Sep 21, 2023

In order to build a private‑deployment data analysis capability for SaaS services, the author evaluates two storage engines—Elasticsearch and ClickHouse—focusing on write throughput, query speed, and server cost.

Elasticsearch vs ClickHouse

ClickHouse offers significantly higher write throughput (50‑200 MB/s per server, over 600 k records/s, >5× ES) and fewer write rejections. Query speed is 5‑30× faster than ES, especially when data resides in pagecache. ClickHouse also reduces storage by 1/3‑1/30 compared to ES, lowering disk I/O and CPU usage, which can halve server costs.

Cost Analysis

A cost comparison based on Alibaba Cloud pricing (without discounts) shows ClickHouse’s lower resource consumption.

Environment Deployment

1. Zookeeper Cluster Deployment

yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile 配置环境变量
yum install ntpdate
ntpdate asia.pool.ntp.org

mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper

export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH

cd $ZOOKEEPER_HOME/conf
vi zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888

echo "1" > /usr/zookeeper/data/myid
echo "2" > /usr/zookeeper/data/myid
echo "3" > /usr/zookeeper/data/myid

cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start

2. Kafka Cluster Deployment

mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka

# broker.id, listeners, and other configs are set in server.properties
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties >/usr/kafka/logs/kafka.log 2>&1 &
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_data

3. FileBeat Deployment

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# create /etc/yum.repos.d/elastic.repo with the following content
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat

# filebeat.yml (important: set keys_under_root: true)
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /root/logs/xxx/inner/*.log
  json:
    keys_under_root: true
output.kafka:
  hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
  topic: 'xxx_data_clickhouse'
  compression: gzip
processors:
- drop_fields:
    fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]

nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &

4. ClickHouse Deployment

# Verify SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

mkdir -p /data/clickhouse
# Add host entries for clickhouse nodes
# Example:
# 10.190.85.92 bigdata-clickhouse-01
# 10.190.85.93 bigdata-clickhouse-02

echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | tee /proc/sys/vm/overcommit_memory
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled

yum install yum-utils
rpm --import <https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG>
yum-config-manager --add-repo <https://repo.clickhouse.tech/rpm/stable/x86_64>
yum -y install clickhouse-server clickhouse-client

# Modify /etc/clickhouse-server/config.xml to set <level>information</level>
# Logs:
# /var/log/clickhouse-server/clickhouse-server.log
# /var/log/clickhouse-server/clickhouse-server.err.log

clickhouse-server --version
clickhouse-client --password

sudo clickhouse stop
sudo clickhouse start

During deployment, several issues were encountered and solved:

Kafka engine table direct select not allowed : start client with --stream_like_engine_allow_direct_select 1.

Local replicated table macro missing : configure distinct <shard> and <replica> values in each node’s macros section.

Replica already exists : delete the problematic Zookeeper node and recreate the table.

Distributed table authentication failure : set correct user and password in <remote_servers> configuration.

Finally, a materialized view was created to sync Kafka‑consumed data into the distributed ClickHouse table, completing the data pipeline.

Conclusion: By following official documentation and troubleshooting step‑by‑step, the full log data flow from Kafka through ClickHouse was achieved.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Big Data data pipeline deployment Elasticsearch ClickHouse

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.