Big Data 12 min read

Performance and Feature Comparison between Elasticsearch and ClickHouse for Log Analytics

This article compares Elasticsearch and ClickHouse in terms of architecture, query capabilities, and performance for log analytics, presenting test setups, Docker‑compose configurations, query examples, and benchmark results that show ClickHouse generally outperforms Elasticsearch in most basic query scenarios.

Selected Java Interview Questions

Sep 15, 2021

Performance and Feature Comparison between Elasticsearch and ClickHouse for Log Analytics

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log processing. ClickHouse, developed by Yandex, is a column‑oriented relational database designed for OLAP workloads and has become a popular alternative for large‑scale log analytics.

Architecture and Design Comparison

Elasticsearch relies on inverted indexes and Bloom filters to support fast full‑text search, using a shard‑and‑replica model for scalability and high availability. Its node roles include client, data, and master nodes.

ClickHouse follows an MPP architecture where each node processes a portion of the data independently. It stores data column‑wise, uses vectorized execution, log‑structured merge trees, sparse indexes, and Zookeeper for coordination, and also supports Bloom filters for search.

Test Setup

A Docker‑Compose environment was created with four stacks: an Elasticsearch stack (single‑node Elasticsearch container and Kibana), a ClickHouse stack (single‑node ClickHouse container and TabixUI client), a data‑ingestion stack using Vector.dev, and a test‑control stack using Jupyter notebooks and the Python SDKs for both systems.

Elasticsearch stack deployment:

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:
    driver: local

ClickHouse stack deployment:

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

Data ingestion uses Vector.dev to generate synthetic syslog data (100,000 records) and send it simultaneously to both Elasticsearch and ClickHouse. The table creation in ClickHouse is performed with the following SQL:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);

Vector configuration (vector.toml) defines sources, transforms, and sinks to route data to both back‑ends, handling field extraction, type coercion, and output formatting.

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  host = "http://host.docker.internal:8123"
  inputs = ["coercer"]
  table = "syslog"
  type = "clickhouse"
  encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  compression = "none"
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  healthcheck.enabled = true

Query Comparison

Both systems were queried using equivalent requests for common operations such as match‑all, single‑field match, multi‑field match, term search, range queries, existence checks, regex searches, and aggregations. Example queries include:

# ES match_all
{ "query": { "match_all": {} } }

# ClickHouse match_all
SELECT * FROM syslog;

# ES term query
{ "query": { "term": { "message": "pretty" } } }

# ClickHouse term query
SELECT * FROM syslog WHERE lowerUTF8(raw) LIKE '%pretty%';

Performance tests were run ten times per query using the Python SDKs, and response time distributions were recorded.

Results

The benchmark shows ClickHouse consistently achieving lower latency than Elasticsearch across most query types, including regex and term queries. Aggregation queries especially benefit from ClickHouse’s columnar storage, delivering significantly faster results.

Even without tuning (e.g., Bloom filters disabled), ClickHouse demonstrated superior performance, indicating its suitability for many log‑search scenarios, while Elasticsearch still offers richer query features for more complex use cases.

Conclusion

The comparative study reveals that ClickHouse outperforms Elasticsearch in basic log‑analytics queries, both in functionality and speed, explaining why many companies are migrating from Elasticsearch to ClickHouse for such workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Search Engine Elasticsearch ClickHouse Log Analytics performance comparison

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.