Databases 13 min read

Elasticsearch vs ClickHouse: Performance Comparison for Log Analytics

This article compares Elasticsearch and ClickHouse as log‑analytics solutions, detailing their architectures, node roles, data ingestion pipelines, query capabilities, and benchmark results, ultimately showing ClickHouse’s superior performance in most tested scenarios.

Efficient Ops

May 7, 2023

Elasticsearch vs ClickHouse: Performance Comparison for Log Analytics

Introduction

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for end‑to‑end log analysis. ClickHouse, developed by Yandex, is a column‑oriented relational database designed for OLAP workloads and has become very popular in the past two years.

While Elasticsearch remains the most widely adopted solution for large‑scale log search, many companies (e.g., Ctrip, Kuaishou) have begun migrating to ClickHouse for their logging pipelines.

Architecture and Design Comparison

Elasticsearch relies on Lucene’s inverted index and Bloom filters to solve search problems on massive data sets. It uses sharding and replication to achieve high performance and availability in a distributed cluster.

Elasticsearch nodes can assume different roles:

Client Node – handles API and data access, does not store or process data.

Data Node – stores data and indexes it.

Master Node – coordinates the cluster, does not store data.

ClickHouse follows an MPP architecture for distributed ROLAP. Each node has equal responsibility and processes a portion of the data without sharing content. Data is stored column‑wise, enabling fast queries by reducing scanned data and leveraging compression. ClickHouse also uses a merge tree engine, sparse indexes, SIMD instructions, and Zookeeper for node coordination.

ClickHouse also supports Bloom filters for search.

Practical Query Comparison

To compare basic query capabilities, a test suite (https://github.com/gangtao/esvsch) was created. The test environment consists of two Docker‑Compose stacks: one for Elasticsearch (single‑node Elasticsearch container + Kibana) and one for ClickHouse (single‑node ClickHouse container + TabixUI client).

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch
volumes:
  elasticsearch-data:
    driver: local

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

Data ingestion uses Vector (similar to Fluentd) to generate syslog data and feed both stacks. The Vector pipeline creates a syslog generator, clones messages, parses fields with a regex, coerces types, and then sends data to ClickHouse and Elasticsearch.

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  host = "http://host.docker.internal:8123"
  inputs = ["coercer"]
  table = "syslog"
  type = "clickhouse"
  encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  compression = "none"
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  healthcheck.enabled = true

After creating the ClickHouse table:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
    PARTITION BY toYYYYMMDD(timestamp)
    ORDER BY timestamp
    TTL timestamp + toIntervalMonth(1);

the same data is ingested into both systems, and a series of queries are executed on each stack. Example queries include match‑all, field match, multi‑field match, term search, range queries, existence checks, regex searches, and aggregations. The queries are expressed in Elasticsearch DSL and ClickHouse SQL.

Performance tests were run ten times per query using the Python SDK. The results show that ClickHouse consistently outperforms Elasticsearch in most query types, including regex and term queries. Aggregation scenarios especially highlight ClickHouse’s advantage due to its columnar engine.

Note that the tests were performed without any specific optimizations or enabling ClickHouse Bloom filters, indicating that ClickHouse already provides excellent performance for many search‑oriented workloads. Elasticsearch, however, still offers a richer query language for cases that cannot be expressed in SQL.

Conclusion

The benchmark demonstrates that ClickHouse delivers superior performance for basic log‑analysis queries compared to Elasticsearch, explaining why many organizations are migrating their logging pipelines to ClickHouse.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Elasticsearch performance benchmark clickhouse Vector Log Analytics

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.