Databases 12 min read

Performance Comparison of Elasticsearch and ClickHouse for Log Search and Analytics

This article compares Elasticsearch and ClickHouse by describing their architectures, presenting Docker‑based test stacks, showing code snippets for deployment, data ingestion, and queries, and reporting performance results that demonstrate ClickHouse generally outperforms Elasticsearch in log‑analytics scenarios.

Architect

Jul 17, 2023

Performance Comparison of Elasticsearch and ClickHouse for Log Search and Analytics

Elasticsearch is a real‑time distributed search engine built on Lucene, often used with Logstash and Kibana (ELK) for log analytics, while ClickHouse is a column‑oriented MPP database from Yandex optimized for OLAP workloads.

The article compares their architectures, node roles, and underlying technologies such as inverted indexes, Bloom filters, and columnar storage, and explains why many companies are migrating log solutions from ES to ClickHouse.

A test environment is built using Docker Compose: an ES stack (Elasticsearch + Kibana) and a ClickHouse stack (ClickHouse + TabixUI). The compose files are shown below.

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch
volumes:
  elasticsearch-data:
    driver: local

The ClickHouse stack is defined as:

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M
  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

A ClickHouse table for syslog data is created with the following SQL:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
    PARTITION BY toYYYYMMDD(timestamp)
    ORDER BY timestamp
    TTL timestamp + toIntervalMonth(1);

Data is generated with Vector.dev, parsed, transformed, and sent to both ES and ClickHouse using the configuration shown below.

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  host = "http://host.docker.internal:8123"
  inputs = ["coercer"]
  table = "syslog"
  type = "clickhouse"
  encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  compression = "none"
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  healthcheck.enabled = true

Various queries are executed on both systems, ranging from match_all, term, multi_match, regex, range, exists, and aggregations. Example ES DSL and ClickHouse SQL snippets are provided for each case.

# ES match_all
{
  "query": {
    "match_all": {}
  }
}
# ClickHouse
SELECT * FROM syslog;

Performance tests run each query ten times via Python SDKs, and the results show ClickHouse consistently outperforms Elasticsearch in most scenarios, especially aggregations, while remaining competitive in term and regex searches.

The article concludes that ClickHouse is a highly efficient alternative for log search and analytics workloads, offering superior performance due to its columnar storage and MPP architecture, though Elasticsearch still provides richer query capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker sql Elasticsearch clickhouse Vector Log Analytics

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.