Comparative Performance and Feature Analysis of Elasticsearch vs ClickHouse
This article presents a practical comparison between Elasticsearch and ClickHouse, detailing their architectures, Docker‑Compose deployment, data ingestion pipelines, a series of representative queries, and benchmark results that show ClickHouse generally outperforms Elasticsearch in basic search and aggregation scenarios.
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log processing. ClickHouse, developed by Yandex, is a column‑oriented relational database optimized for OLAP workloads and has become very popular in recent years.
The author sets up two Docker‑Compose stacks: an Elasticsearch stack (single‑node Elasticsearch container plus Kibana) and a ClickHouse stack (single‑node ClickHouse container plus Tabix UI). The Docker‑Compose files are provided below.
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MData ingestion uses Vector.dev as a flexible pipeline. A ClickHouse table is created to store syslog‑style logs:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);The Vector configuration generates 100,000 syslog records, parses them with a regex, coerces field types, and then sinks the data to both Elasticsearch and ClickHouse:
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P
\d*)>(?P
\d) (?P
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P
\w+\.\w+) (?P
\w+) (?P
\d+) (?P
ID\d+) - (?P
.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application", "hostname", "message", "mid", "pid", "priority", "raw", "timestamp", "version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = trueAfter launching the stacks and feeding data, a series of representative queries are executed on both systems. Example queries include match_all, field match, multi‑field match, term query, range query, existence check, regex query, and aggregations. The queries are shown in JSON for Elasticsearch and in SQL for ClickHouse.
# ES match_all
{ "query": { "match_all": {} } }
# ClickHouse
SELECT * FROM syslog; # ES match hostname
{ "query": { "match": { "hostname": "for.org" } } }
# ClickHouse
SELECT * FROM syslog WHERE hostname='for.org'; # ES term query on message
{ "query": { "term": { "message": "pretty" } } }
# ClickHouse
SELECT * FROM syslog WHERE lowerUTF8(raw) LIKE '%pretty%'; # ES aggregation count version
{ "aggs": { "version_count": { "value_count": { "field": "version" } } } }
# ClickHouse
SELECT count(version) FROM syslog;Each query is run ten times using the Python SDK for both stacks, and response time distributions are collected. The results show that ClickHouse consistently outperforms Elasticsearch in most basic query scenarios, including term and regex searches, and excels dramatically in aggregation workloads due to its columnar storage engine.
In conclusion, the benchmark demonstrates that ClickHouse is a highly efficient alternative to Elasticsearch for many log‑search and analytical use cases, especially when aggregation performance is critical. However, Elasticsearch still offers richer query capabilities and a mature ecosystem for full‑text search, which may be required for more complex scenarios.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.