Elasticsearch vs ClickHouse: Architecture, Queries, and Performance
This article compares Elasticsearch and ClickHouse by examining their underlying architectures, node roles, query languages, and performance through a series of benchmark tests using Docker‑compose, Vector data pipelines, and Python SDKs, revealing ClickHouse’s superior speed in most query scenarios despite lacking advanced search features.
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack). ClickHouse is a column‑oriented relational database developed by Yandex, open‑sourced in 2016.
While Elasticsearch has been the most popular big‑data log and search solution, many companies have begun migrating their logging pipelines to ClickHouse.
Architecture and Design Comparison
Elasticsearch relies on Lucene to solve search problems using inverted indexes and Bloom filters, and provides distributed capabilities through shard and replica mechanisms.
ClickHouse is a distributed MPP ROLAP engine with true columnar storage, employing a log‑structured merge tree, sparse indexes, SIMD optimizations, and Zookeeper for node coordination.
Node Roles
Client Node – handles API and data access, does not store or process data.
Data Node – stores and indexes data.
Master Node – coordinates the cluster, does not store data.
ClickHouse nodes have equal responsibilities, each processing a portion of the data without sharing content.
Query Comparison in Practice
Code repository: https://github.com/gangtao/esvsch
The test architecture uses Docker Compose to launch two stacks.
Elasticsearch stack consists of a single‑node Elasticsearch container and a Kibana container.
Elasticsearch container configuration:
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: localClickHouse stack includes a ClickHouse container and TabixUI as a client.
version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MData ingestion uses Vector to generate syslog records and writes them to both stacks. ClickHouse table creation:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);Vector pipeline configuration (excerpt):
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = trueRun the pipeline with Docker:
docker run \
-v $(mkfile_path)/vector.toml:/etc/vector/vector.toml:ro \
-p 18383:8383 \
timberio/vector:nightly-alpineBenchmark queries were executed on both stacks using the Python SDK, covering match_all, match, multi_match, term, range, exists, regex, and aggregation queries. For each query, the equivalent Elasticsearch DSL and ClickHouse SQL are shown.
Performance results (charts):
The tests show ClickHouse consistently outperforms Elasticsearch in most query types, especially aggregations, even without enabling Bloom filters or other optimizations.
Conclusion
The comparison demonstrates ClickHouse’s excellent performance for basic query scenarios, explaining why many companies are switching from Elasticsearch to ClickHouse, although Elasticsearch still offers richer search capabilities.
Reference: https://zhuanlan.zhihu.com/p/353296392
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
