Databases 13 min read

Why ClickHouse Outperforms Elasticsearch in Real‑World Queries

This article compares Elasticsearch and ClickHouse across architecture, query capabilities, and performance using Docker‑compose stacks and Python SDK tests, demonstrating that ClickHouse often delivers superior speed, especially in aggregation and regex queries, while highlighting each system’s design trade‑offs.

Efficient Ops

Aug 2, 2023

Why ClickHouse Outperforms Elasticsearch in Real‑World Queries

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for end‑to‑end log and search analysis. ClickHouse, developed by Yandex, is a column‑oriented relational database for OLAP workloads, open‑sourced in 2016.

While Elasticsearch remains the most popular big‑data log and search solution, many companies (e.g., Ctrip, Kuaishou) have begun migrating to ClickHouse.

Architecture and Design Comparison

Elasticsearch relies on Lucene and focuses on search, using inverted indexes and Bloom filters. It achieves distributed search performance and high availability through sharding and replica mechanisms.

Client Node – handles API and data access, does not store or process data.

Data Node – stores data and builds indexes.

Master Node – coordinates the cluster, does not store data.

ClickHouse follows an MPP architecture for distributed ROLAP. Each node has equal responsibility and processes a portion of the data. It stores data column‑wise, uses compression, sparse indexes, log‑structured merge trees, and leverages CPU features such as SIMD. Coordination between nodes is handled by Zookeeper.

Both systems support Bloom filters for search acceleration.

Query Comparison Practical Test

The test setup includes four parts: an Elasticsearch stack, a ClickHouse stack, a data‑import stack using Vector.dev, and a test‑control stack using Jupyter notebooks.

1. Elasticsearch stack

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  kibana:
    image: docker.elastic.co/kibana/kibana:7.4.0
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch
volumes:
  elasticsearch-data:
    driver: local

2. ClickHouse stack

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

3. Data import stack (Vector.dev)

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
    PARTITION BY toYYYYMMDD(timestamp)
    ORDER BY timestamp
    TTL timestamp + toIntervalMonth(1);

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  host = "http://host.docker.internal:8123"
  inputs = ["coercer"]
  table = "syslog"
  type = "clickhouse"
  encoding.only_fields = ["application", "hostname", "message", "mid", "pid", "priority", "raw", "timestamp", "version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  compression = "none"
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  healthcheck.enabled = true

Running the Docker compose brings up both stacks, after which Vector generates synthetic syslog data (100 k records) and writes it simultaneously to Elasticsearch and ClickHouse.

Query examples

All queries were executed ten times via the respective Python SDKs and response time distributions were recorded.

Return all records

# ES
{ "query": { "match_all": {} } }
# ClickHouse
SELECT * FROM syslog

Match a single field

# ES
{ "query": { "match": { "hostname": "for.org" } } }
# ClickHouse
SELECT * FROM syslog WHERE hostname='for.org'

Match multiple fields

# ES
{ "query": { "multi_match": { "query": "up.com ahmadajmi", "fields": ["hostname", "application"] } } }
# ClickHouse
SELECT * FROM syslog WHERE hostname='for.org' OR application='ahmadajmi'

Term (word) query

# ES
{ "query": { "term": { "message": "pretty" } } }
# ClickHouse
SELECT * FROM syslog WHERE lowerUTF8(raw) LIKE '%pretty%'

Range query (version >= 2)

# ES
{ "query": { "range": { "version": { "gte": 2 } } } }
# ClickHouse
SELECT * FROM syslog WHERE version >= 2

Exists query

# ES
{ "query": { "exists": { "field": "application" } } }
# ClickHouse
SELECT * FROM syslog WHERE application IS NOT NULL

Regex query

# ES
{ "query": { "regexp": { "hostname": { "value": "up.*", "flags": "ALL" } } } }
# ClickHouse
SELECT * FROM syslog WHERE match(hostname, 'up.*')

Aggregation count

# ES
{ "aggs": { "version_count": { "value_count": { "field": "version" } } } }
# ClickHouse
SELECT count(version) FROM syslog

Cardinality (distinct count)

# ES
{ "aggs": { "my-agg-name": { "cardinality": { "field": "priority" } } } }
# ClickHouse
SELECT count(DISTINCT(priority)) FROM syslog

Performance results show that ClickHouse consistently outperforms Elasticsearch in most query types, including regex and term queries, and excels dramatically in aggregation scenarios due to its columnar engine.

The total query time comparison further confirms ClickHouse’s speed advantage.

These tests were run without any specific optimizations or Bloom filter activation for ClickHouse, indicating its strong out‑of‑the‑box performance for many search‑related workloads. While Elasticsearch offers a richer query DSL and features not covered here, ClickHouse proves to be a highly capable alternative for log and analytics scenarios.

Conclusion

The comparative tests demonstrate that ClickHouse delivers superior performance for basic queries and aggregations, explaining why many enterprises are migrating from Elasticsearch to ClickHouse for large‑scale log and analytics use cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Search Engine Elasticsearch ClickHouse Database Comparison Docker Compose

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.