Elasticsearch vs ClickHouse: Performance Comparison for Log Analytics
This article compares Elasticsearch and ClickHouse as log‑analytics solutions, detailing their architectures, node roles, data ingestion pipelines, query capabilities, and benchmark results, ultimately showing ClickHouse’s superior performance in most tested scenarios.
Introduction
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for end‑to‑end log analysis. ClickHouse, developed by Yandex, is a column‑oriented relational database designed for OLAP workloads and has become very popular in the past two years.
While Elasticsearch remains the most widely adopted solution for large‑scale log search, many companies (e.g., Ctrip, Kuaishou) have begun migrating to ClickHouse for their logging pipelines.
Architecture and Design Comparison
Elasticsearch relies on Lucene’s inverted index and Bloom filters to solve search problems on massive data sets. It uses sharding and replication to achieve high performance and availability in a distributed cluster.
Elasticsearch nodes can assume different roles:
Client Node – handles API and data access, does not store or process data.
Data Node – stores data and indexes it.
Master Node – coordinates the cluster, does not store data.
ClickHouse follows an MPP architecture for distributed ROLAP. Each node has equal responsibility and processes a portion of the data without sharing content. Data is stored column‑wise, enabling fast queries by reducing scanned data and leveraging compression. ClickHouse also uses a merge tree engine, sparse indexes, SIMD instructions, and Zookeeper for node coordination.
ClickHouse also supports Bloom filters for search.
Practical Query Comparison
To compare basic query capabilities, a test suite (https://github.com/gangtao/esvsch) was created. The test environment consists of two Docker‑Compose stacks: one for Elasticsearch (single‑node Elasticsearch container + Kibana) and one for ClickHouse (single‑node ClickHouse container + TabixUI client).
<code>version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local</code> <code>version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128M</code>Data ingestion uses Vector (similar to Fluentd) to generate syslog data and feed both stacks. The Vector pipeline creates a syslog generator, clones messages, parses fields with a regex, coerces types, and then sends data to ClickHouse and Elasticsearch.
<code>[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = true</code>After creating the ClickHouse table:
<code>CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);</code>the same data is ingested into both systems, and a series of queries are executed on each stack. Example queries include match‑all, field match, multi‑field match, term search, range queries, existence checks, regex searches, and aggregations. The queries are expressed in Elasticsearch DSL and ClickHouse SQL.
Performance tests were run ten times per query using the Python SDK. The results show that ClickHouse consistently outperforms Elasticsearch in most query types, including regex and term queries. Aggregation scenarios especially highlight ClickHouse’s advantage due to its columnar engine.
Note that the tests were performed without any specific optimizations or enabling ClickHouse Bloom filters, indicating that ClickHouse already provides excellent performance for many search‑oriented workloads. Elasticsearch, however, still offers a richer query language for cases that cannot be expressed in SQL.
Conclusion
The benchmark demonstrates that ClickHouse delivers superior performance for basic log‑analysis queries compared to Elasticsearch, explaining why many organizations are migrating their logging pipelines to ClickHouse.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.