Databases 13 min read

Elasticsearch vs ClickHouse: Performance Comparison, Cost Analysis, and Deployment Guide

This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, provides a cost analysis, and offers step‑by‑step deployment instructions for Zookeeper, Kafka, FileBeat, and ClickHouse, including troubleshooting tips and configuration examples.

Code Ape Tech Column

Oct 24, 2024

The author introduces the need for a private‑cloud data analysis solution to reduce server overhead for SaaS services, and chooses ClickHouse as a cost‑effective alternative to Elasticsearch.

Elasticsearch vs ClickHouse

ClickHouse offers significantly higher write throughput (50‑200 MB/s per server, over 600 k records/s, >5× ES), fewer write rejections, and faster queries (2‑30 GB/s in pagecache, 5‑30× faster than ES). Its higher compression (1/3‑1/30 of ES) reduces disk usage and I/O, leading to lower CPU, memory, and overall server costs.

Cost analysis based on Alibaba Cloud pricing shows ClickHouse can halve server expenses compared to Elasticsearch.

Environment Deployment

1. Zookeeper Cluster

Installation and configuration steps include installing Java, setting up directories, extracting Zookeeper binaries, defining zoo.cfg with tickTime, initLimit, syncLimit, dataDir, clientPort, and server definitions, creating myid on each node, and starting the service.

yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile 配置环境变量
...
sh zkServer.sh start

2. Kafka Cluster

mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
...
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh ... &

3. FileBeat

sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
cat > /etc/yum.repos.d/elastic.repo <<EOF
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
yum install filebeat
systemctl enable filebeat
...

FileBeat configuration highlights keys_under_root: true to flatten JSON fields and Kafka output settings.

4. ClickHouse

Before installation, verify CPU supports SSE 4.2, create data directories, adjust CPU governor, disable overcommit and transparent huge pages, add the official repository, and install clickhouse-server and clickhouse-client. Configuration changes include setting log level to information and reviewing log paths.

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
mkdir -p /data/clickhouse
...
yum -y install clickhouse-server clickhouse-client

Troubleshooting and Solutions

Kafka Engine Table – Direct select disabled; enable with --stream_like_engine_allow_direct_select 1 when launching the client.

clickhouse-client --stream_like_engine_allow_direct_select 1 --password xxxxx

Local Table Macro – Missing shard macro; define distinct shard values per node in <macros> section.

<macros>
  <shard>01</shard>
  <replica>example01-01-1</replica>
</macros>

Replica Already Exists – Remove stale Zookeeper nodes before recreating the replicated table.

Distributed Table Authentication – Ensure correct user/password in <remote_servers> configuration.

<remote_servers>
  <clickhouse_cluster>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>ip1</host>
        <port>9000</port>
        <user>default</user>
        <password>xxxx</password>
      </replica>
    </shard>
    ...
  </clickhouse_cluster>
</remote_servers>

After resolving these issues, the author creates a distributed table and a materialized view to sync Kafka data into ClickHouse.

CREATE TABLE default.bi_inner_log_all ON CLUSTER clickhouse_cluster AS default.bi_inner_log_local ENGINE = Distributed(...);
CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS SELECT ... FROM default.kafka_clickhouse_inner_log;

Conclusion: By following official documentation and troubleshooting steps, the full data pipeline—from log collection to ClickHouse storage—was successfully built, offering high performance and lower cost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database Deployment Elasticsearch ZooKeeper Kafka clickhouse

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.