Comparative Analysis of Elasticsearch and ClickHouse: Architecture, Query Performance, and Practical Benchmarks
This article compares Elasticsearch and ClickHouse by outlining their architectures, detailing deployment configurations, presenting benchmark queries and performance results, and concluding that ClickHouse generally outperforms Elasticsearch in many basic search and aggregation scenarios, while also noting each system's strengths and limitations.
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack). ClickHouse, developed by Yandex, is a column‑oriented relational database for OLAP workloads that has become very popular in recent years.
Many companies, such as Ctrip and Kuaishou, are migrating their log‑analysis pipelines from Elasticsearch to ClickHouse due to performance and cost considerations.
Architecture and Design Comparison
Elasticsearch relies on Lucene’s inverted index and Bloom filters to solve search problems at scale. It uses a distributed architecture with shards and replicas, and nodes can assume different roles: client node (API access), data node (stores and indexes data), and master node (cluster coordination).
ClickHouse follows an MPP (Massively Parallel Processing) architecture for distributed ROLAP. Every node has equal responsibility and processes a portion of the data. Data is stored column‑wise, uses vectorized execution, log‑structured merge trees, sparse indexes, SIMD optimizations, and Zookeeper for coordination. ClickHouse also supports Bloom filters for search.
Query Comparison – Practical Test
To compare basic query capabilities, a Docker‑Compose test environment was built. The Elasticsearch stack consists of a single Elasticsearch container and a Kibana container:
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: localThe ClickHouse stack includes a single ClickHouse container and TabixUI as a client:
version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MData ingestion uses Vector.dev (similar to Fluentd) to generate synthetic syslog data and feed both stacks. The ClickHouse table is created with:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);The Vector pipeline (vector.toml) defines sources, transforms, and sinks for both Elasticsearch and ClickHouse:
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
# General
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P
\d*)>(?P
\d) (?P
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P
\w+\.\w+) (?P
\w+) (?P
\d+) (?P
ID\d+) - (?P
.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application", "hostname", "message", "mid", "pid", "priority", "raw", "timestamp", "version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = trueBenchmark queries were executed on both stacks (10 runs each) covering match_all, single‑field match, multi‑field match, term, range, exists, regex, and aggregation scenarios. Example queries:
Match all: ES {"query":{"match_all":{}}} vs ClickHouse SELECT * FROM syslog
Single‑field match: ES {"query":{"match":{"hostname":"for.org"}}} vs ClickHouse SELECT * FROM syslog WHERE hostname='for.org'
Range query: ES {"query":{"range":{"version":{"gte":2}}}} vs ClickHouse SELECT * FROM syslog WHERE version >= 2
Aggregation count: ES {"aggs":{"version_count":{"value_count":{"field":"version"}}}} vs ClickHouse SELECT count(version) FROM syslog
Distinct count: ES {"aggs":{"my-agg-name":{"cardinality":{"field":"priority"}}}} vs ClickHouse SELECT count(distinct(priority)) FROM syslog
Performance results (shown in the included images) indicate that ClickHouse consistently delivers lower latency than Elasticsearch for most queries, especially aggregation‑heavy workloads, while remaining competitive for regex and term queries.
The author notes that the tests were run without any tuning or enabling of ClickHouse Bloom filters, yet ClickHouse still outperformed Elasticsearch, demonstrating its suitability for many search‑oriented use cases.
Conclusion
The comparative tests show that ClickHouse excels in basic query and aggregation performance compared to Elasticsearch, explaining why many organizations are migrating their log‑analysis pipelines to ClickHouse. Elasticsearch still offers richer query features, but for the scenarios covered, ClickHouse provides superior speed.
Additional promotional content encouraging readers to like, share, and follow the author’s knowledge platform is present in the original article.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.