Big Data 15 min read

Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide for Enterprise Data Analytics

This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a detailed deployment guide for Zookeeper, Kafka, Filebeat, and ClickHouse clusters, including troubleshooting steps and cost analysis for enterprise data analytics.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide for Enterprise Data Analytics

Elasticsearch vs ClickHouse

ClickHouse is a high‑performance column‑oriented distributed DBMS. Tests show it has several advantages over Elasticsearch:

Write throughput : a single server can ingest 50‑200 MB/s (over 600 k records/s), more than five times Elasticsearch, with fewer write rejections and latency.

Query speed : data cached in pagecache yields 2‑30 GB/s query rates; even without cache, ClickHouse outperforms Elasticsearch by 5‑30×.

Server cost : ClickHouse compresses data 1/3‑1/30 of Elasticsearch, reducing disk usage, I/O, and memory/CPU consumption, potentially halving server costs.

Cost Analysis

Cost estimation based on Alibaba Cloud (no discounts) shows significant savings when using ClickHouse instead of Elasticsearch.

Environment Deployment

1. Zookeeper cluster deployment

yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile configure environment variables
yum install ntpdate
ntpdate asia.pool.ntp.org

mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper

export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH

cd $ZOOKEEPER_HOME/conf
vi zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888

echo "1" > /usr/zookeeper/data/myid
echo "2" > /usr/zookeeper/data/myid
echo "3" > /usr/zookeeper/data/myid

cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start

2. Kafka cluster deployment

mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka

# broker configuration (example for broker.id=1)
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0

nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &

/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_data

3. Filebeat deployment

sudo rpm --import https://packages.elastic.co/GPK-KEY-elasticsearch
# create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPK-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat

# filebeat.yml (key points)
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /root/logs/xxx/inner/*.log
  json:
    keys_under_root: true
output.kafka:
  hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
  topic: 'xxx_data_clickhouse'
  partition.round_robin:
    reachable_only: false
    required_acks: 1
    compression: gzip
processors:
  - drop_fields:
      fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]
      ignore_missing: false

nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &

4. ClickHouse deployment

# Check SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

mkdir -p /data/clickhouse
# add host entries for clickhouse nodes
10.190.85.92 bigdata-clickhouse-01
10.190.85.93 bigdata-clickhouse-02

# Performance tuning
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | tee /proc/sys/vm/overcommit_memory
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled

# Install ClickHouse repository
yum install yum-utils
rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
yum -y install clickhouse-server clickhouse-client

# Adjust config.xml (set log level to information)
information
# Start/stop commands
sudo clickhouse stop
sudo clickhouse start
sudo clickhouse restart

Common ClickHouse Issues and Solutions

1) Kafka engine table cannot be queried directly

Direct select is not allowed. To enable use setting stream_like_engine_allow_direct_select. (QUERY_NOT_ALLOWED)

Solution: launch the client with the flag --stream_like_engine_allow_direct_select 1 . clickhouse-client --stream_like_engine_allow_direct_select 1 --password xxxx

2) Local replicated table macro error

DB::Exception: No macro 'shard' in config while processing substitutions ...

Solution: define distinct shard values for each node in the <macros> section of the ClickHouse config.

<macros>
  <shard>01</shard>
  <replica>example01-01-1</replica>
</macros>

3) Replica already exists error

DB::Exception: Replica ... already exists. (REPLICA_IS_ALREADY_EXIST)

Solution: delete the stale Zookeeper node for that replica and recreate the table.

4) Distributed table authentication failure

Authentication failed: password is incorrect or there is no user with such name. (AUTHENTICATION_FAILED)

Solution: correct the user and password entries in the <remote_servers> configuration.

<remote_servers>
  <clickhouse_cluster>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>ip1</host>
        <port>9000</port>
        <user>default</user>
        <password>xxxx</password>
      </replica>
    </shard>
    ...
  </clickhouse_cluster>
</remote_servers>

Materialized View for Kafka → ClickHouse

CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS
SELECT
    log_uuid,
    date_partition,
    event_name,
    activity_name,
    credits_bring,
    activity_type,
    activity_id
FROM default.kafka_clickhouse_inner_log;

After resolving the above issues, the data pipeline from Kafka through Filebeat into ClickHouse works end‑to‑end.

In summary, refer to official documentation or --help for troubleshooting; systematic exploration leads to a robust enterprise data platform.

Big DataDeploymentElasticsearchClickHouseData Analyticscost analysis
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.