Databases 12 min read

Elasticsearch vs ClickHouse: Architecture, Queries, and Performance

This article compares Elasticsearch and ClickHouse by examining their underlying architectures, node roles, query languages, and performance through a series of benchmark tests using Docker‑compose, Vector data pipelines, and Python SDKs, revealing ClickHouse’s superior speed in most query scenarios despite lacking advanced search features.

MaGe Linux Operations

Aug 5, 2023

Elasticsearch vs ClickHouse: Architecture, Queries, and Performance

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack). ClickHouse is a column‑oriented relational database developed by Yandex, open‑sourced in 2016.

While Elasticsearch has been the most popular big‑data log and search solution, many companies have begun migrating their logging pipelines to ClickHouse.

Architecture and Design Comparison

Elasticsearch relies on Lucene to solve search problems using inverted indexes and Bloom filters, and provides distributed capabilities through shard and replica mechanisms.

ClickHouse is a distributed MPP ROLAP engine with true columnar storage, employing a log‑structured merge tree, sparse indexes, SIMD optimizations, and Zookeeper for node coordination.

Node Roles

Client Node – handles API and data access, does not store or process data.

Data Node – stores and indexes data.

Master Node – coordinates the cluster, does not store data.

ClickHouse nodes have equal responsibilities, each processing a portion of the data without sharing content.

Query Comparison in Practice

Code repository: https://github.com/gangtao/esvsch

The test architecture uses Docker Compose to launch two stacks.

Elasticsearch stack consists of a single‑node Elasticsearch container and a Kibana container.

Elasticsearch container configuration:

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch
volumes:
  elasticsearch-data:
    driver: local

ClickHouse stack includes a ClickHouse container and TabixUI as a client.

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

Data ingestion uses Vector to generate syslog records and writes them to both stacks. ClickHouse table creation:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
    PARTITION BY toYYYYMMDD(timestamp)
    ORDER BY timestamp
    TTL timestamp + toIntervalMonth(1);

Vector pipeline configuration (excerpt):

[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000

[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"

[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"

[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"

[sinks.out_clickhouse]
host = "http://host.docker.internal:8123"
inputs = ["coercer"]
table = "syslog"
type = "clickhouse"
encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
encoding.timestamp_format = "unix"

[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
compression = "none"
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
healthcheck.enabled = true

Run the pipeline with Docker:

docker run \
  -v $(mkfile_path)/vector.toml:/etc/vector/vector.toml:ro \
  -p 18383:8383 \
  timberio/vector:nightly-alpine

Benchmark queries were executed on both stacks using the Python SDK, covering match_all, match, multi_match, term, range, exists, regex, and aggregation queries. For each query, the equivalent Elasticsearch DSL and ClickHouse SQL are shown.

Performance results (charts):

The tests show ClickHouse consistently outperforms Elasticsearch in most query types, especially aggregations, even without enabling Bloom filters or other optimizations.

Conclusion

The comparison demonstrates ClickHouse’s excellent performance for basic query scenarios, explaining why many companies are switching from Elasticsearch to ClickHouse, although Elasticsearch still offers richer search capabilities.

Reference: https://zhuanlan.zhihu.com/p/353296392

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch ClickHouse

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.