Performance Comparison of Elasticsearch and ClickHouse for Log Analytics
This article compares Elasticsearch and ClickHouse by describing their architectures, demonstrating a Docker‑compose test environment, executing equivalent queries via both systems, and presenting performance results that show ClickHouse generally outperforms Elasticsearch in basic search and aggregation scenarios for log data.
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log processing, while ClickHouse is a column‑oriented MPP database developed by Yandex and popular in OLAP workloads.
The article first outlines the architectural differences: Elasticsearch nodes can be client, data, or master nodes, whereas ClickHouse uses a uniform MPP architecture with each node handling a portion of the data. Both systems support Bloom filters for efficient searching.
Architecture and Design Comparison
Elasticsearch relies on inverted indexes and shard/replica mechanisms for scalability and high availability. ClickHouse stores data column‑wise, uses merge trees, sparse indexes, and SIMD optimizations, and coordinates nodes via Zookeeper.
Practical Query Comparison
A Docker‑compose stack was created for each system. The Elasticsearch stack includes a single‑node Elasticsearch container and a Kibana container. The ClickHouse stack includes a ClickHouse container and a TabixUI client. The data ingestion pipeline uses Vector to generate synthetic syslog records and writes them to both stacks.
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MThe ClickHouse table for syslog data is created with:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);Vector's configuration defines sources, transforms, and sinks to generate syslog entries, parse fields, coerce types, and write to both Elasticsearch and ClickHouse.
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P
\d*)>(?P
\d) (?P
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P
\w+\.\w+) (?P
\w+) (?P
\d+) (?P
ID\d+) - (?P
.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application", "hostname", "message", "mid", "pid", "priority", "raw", "timestamp", "version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = trueEquivalent queries were executed on both systems, ranging from match‑all, field matches, multi‑match, term, range, existence, regex, and aggregations. The results consistently showed ClickHouse achieving lower latency, especially in aggregation scenarios, while still performing competitively in term and regex queries.
In conclusion, ClickHouse demonstrates superior performance for most basic query patterns in log analytics, leveraging its columnar storage and efficient execution engine, whereas Elasticsearch offers richer query capabilities but at a performance cost for the tested workloads.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.