Performance Comparison of Elasticsearch and ClickHouse for Log Search and Analytics
This article compares Elasticsearch and ClickHouse by describing their architectures, presenting Docker‑based test stacks, showing code snippets for deployment, data ingestion, and queries, and reporting performance results that demonstrate ClickHouse generally outperforms Elasticsearch in log‑analytics scenarios.
Elasticsearch is a real‑time distributed search engine built on Lucene, often used with Logstash and Kibana (ELK) for log analytics, while ClickHouse is a column‑oriented MPP database from Yandex optimized for OLAP workloads.
The article compares their architectures, node roles, and underlying technologies such as inverted indexes, Bloom filters, and columnar storage, and explains why many companies are migrating log solutions from ES to ClickHouse.
A test environment is built using Docker Compose: an ES stack (Elasticsearch + Kibana) and a ClickHouse stack (ClickHouse + TabixUI). The compose files are shown below.
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: localThe ClickHouse stack is defined as:
version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MA ClickHouse table for syslog data is created with the following SQL:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);Data is generated with Vector.dev, parsed, transformed, and sent to both ES and ClickHouse using the configuration shown below.
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P
\d*)>(?P
\d) (?P
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P
\w+\.\w+) (?P
\w+) (?P
\d+) (?P
ID\d+) - (?P
.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = trueVarious queries are executed on both systems, ranging from match_all, term, multi_match, regex, range, exists, and aggregations. Example ES DSL and ClickHouse SQL snippets are provided for each case.
# ES match_all
{
"query": {
"match_all": {}
}
}
# ClickHouse
SELECT * FROM syslog;Performance tests run each query ten times via Python SDKs, and the results show ClickHouse consistently outperforms Elasticsearch in most scenarios, especially aggregations, while remaining competitive in term and regex searches.
The article concludes that ClickHouse is a highly efficient alternative for log search and analytics workloads, offering superior performance due to its columnar storage and MPP architecture, though Elasticsearch still provides richer query capabilities.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.