Why ClickHouse Outperforms Elasticsearch in Log Analytics: A Practical Comparison
This article compares Elasticsearch and ClickHouse for log analytics by detailing their architectures, setting up Docker‑Compose stacks, ingesting synthetic syslog data with Vector, running equivalent queries, and measuring performance, revealing ClickHouse’s superior speed in most scenarios.
Introduction
Elasticsearch (ES) is a real‑time distributed search and analytics engine built on Lucene. ClickHouse, developed by Yandex, is a column‑oriented relational OLAP database that has become popular for large‑scale analytical workloads.
Architecture Comparison
ES uses inverted indexes and Bloom filters to provide fast full‑text search, with shard and replica mechanisms for scalability and high availability. ClickHouse follows an MPP architecture where each node processes a partition of the data independently, stores data column‑wise, and leverages vectorized execution, log‑structured merge trees, sparse indexes and SIMD instructions. Both systems support Bloom filters.
Test Environment
Two Docker‑Compose stacks were used:
ES stack: a single‑node Elasticsearch container and a Kibana container.
ClickHouse stack: a ClickHouse server container and a TabixUI client container.
Docker‑Compose files (simplified):
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
image: docker.elastic.co/kibana/kibana:7.4.0
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local version: "3.7"
services:
clickhouse:
image: yandex/clickhouse-server
container_name: clickhouse
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
image: spoonest/clickhouse-tabix-web-client
container_name: tabixui
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MData Ingestion Pipeline
Data was generated with Vector.dev (similar to Fluentd). The Vector configuration defines a syslog generator, parsing, type coercion, and sinks for console, ClickHouse, and Elasticsearch.
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
type = "clickhouse"
inputs = ["coercer"]
host = "http://host.docker.internal:8123"
table = "syslog"
encoding.only_fields = ["application", "hostname", "message", "mid", "pid", "priority", "raw", "timestamp", "version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = trueThe pipeline is started with:
docker run \
-v $(mkfile_path)/vector.toml:/etc/vector/vector.toml:ro \
-p 18383:8383 \
timberio/vector:nightly-alpineClickHouse Table Definition
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);Query Equivalence
After data ingestion, equivalent queries were executed on both stacks.
Return all records
# Elasticsearch
{ "query": { "match_all": {} } }
# ClickHouse
SELECT * FROM syslog;Match a single field
# Elasticsearch
{ "query": { "match": { "hostname": "for.org" } } }
# ClickHouse
SELECT * FROM syslog WHERE hostname='for.org';Multi‑field match
# Elasticsearch
{ "query": { "multi_match": { "query": "up.com ahmadajmi", "fields": ["hostname", "application"] } } }
# ClickHouse
SELECT * FROM syslog WHERE hostname='for.org' OR application='ahmadajmi';Term (word) search
# Elasticsearch
{ "query": { "term": { "message": "pretty" } } }
# ClickHouse
SELECT * FROM syslog WHERE lowerUTF8(raw) LIKE '%pretty%';Range query (version >= 2)
# Elasticsearch
{ "query": { "range": { "version": { "gte": 2 } } } }
# ClickHouse
SELECT * FROM syslog WHERE version >= 2;Exists query
# Elasticsearch
{ "query": { "exists": { "field": "application" } } }
# ClickHouse
SELECT * FROM syslog WHERE application IS NOT NULL;Regex query
# Elasticsearch
{ "query": { "regexp": { "hostname": { "value": "up.*", "flags": "ALL", "max_determinized_states": 10000, "rewrite": "constant_score" } } } }
# ClickHouse
SELECT * FROM syslog WHERE match(hostname, 'up.*');Aggregation – count of a field
# Elasticsearch
{ "aggs": { "version_count": { "value_count": { "field": "version" } } } }
# ClickHouse
SELECT count(version) FROM syslog;Distinct count
# Elasticsearch
{ "aggs": { "my-agg-name": { "cardinality": { "field": "priority" } } } }
# ClickHouse
SELECT count(DISTINCT priority) FROM syslog;Performance Results
Each query was executed ten times using the Python SDK for both stacks. ClickHouse consistently showed lower latency, especially for aggregation queries where its columnar engine excels.
The overall query‑time comparison confirms ClickHouse’s advantage.
Conclusion
The benchmark demonstrates that ClickHouse outperforms Elasticsearch in most basic query scenarios, with particularly strong performance for aggregations due to its columnar storage and vectorized execution. Elasticsearch offers a richer DSL and flexible schema, but for log‑analysis workloads that fit the tested patterns, ClickHouse provides faster execution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
