Databases 14 min read

Comparative Performance and Feature Analysis of Elasticsearch vs ClickHouse

This article presents a practical comparison between Elasticsearch and ClickHouse, detailing their architectures, Docker‑Compose deployment, data ingestion pipelines, a series of representative queries, and benchmark results that show ClickHouse generally outperforms Elasticsearch in basic search and aggregation scenarios.

Top Architect

Apr 26, 2023

Comparative Performance and Feature Analysis of Elasticsearch vs ClickHouse

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log processing. ClickHouse, developed by Yandex, is a column‑oriented relational database optimized for OLAP workloads and has become very popular in recent years.

The author sets up two Docker‑Compose stacks: an Elasticsearch stack (single‑node Elasticsearch container plus Kibana) and a ClickHouse stack (single‑node ClickHouse container plus Tabix UI). The Docker‑Compose files are provided below.

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:
    driver: local

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

Data ingestion uses Vector.dev as a flexible pipeline. A ClickHouse table is created to store syslog‑style logs:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);

The Vector configuration generates 100,000 syslog records, parses them with a regex, coerces field types, and then sinks the data to both Elasticsearch and ClickHouse:

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  host = "http://host.docker.internal:8123"
  inputs = ["coercer"]
  table = "syslog"
  type = "clickhouse"
  encoding.only_fields = ["application", "hostname", "message", "mid", "pid", "priority", "raw", "timestamp", "version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  compression = "none"
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  healthcheck.enabled = true

After launching the stacks and feeding data, a series of representative queries are executed on both systems. Example queries include match_all, field match, multi‑field match, term query, range query, existence check, regex query, and aggregations. The queries are shown in JSON for Elasticsearch and in SQL for ClickHouse.

# ES match_all
{ "query": { "match_all": {} } }
# ClickHouse
SELECT * FROM syslog;

# ES match hostname
{ "query": { "match": { "hostname": "for.org" } } }
# ClickHouse
SELECT * FROM syslog WHERE hostname='for.org';

# ES term query on message
{ "query": { "term": { "message": "pretty" } } }
# ClickHouse
SELECT * FROM syslog WHERE lowerUTF8(raw) LIKE '%pretty%';

# ES aggregation count version
{ "aggs": { "version_count": { "value_count": { "field": "version" } } } }
# ClickHouse
SELECT count(version) FROM syslog;

Each query is run ten times using the Python SDK for both stacks, and response time distributions are collected. The results show that ClickHouse consistently outperforms Elasticsearch in most basic query scenarios, including term and regex searches, and excels dramatically in aggregation workloads due to its columnar storage engine.

In conclusion, the benchmark demonstrates that ClickHouse is a highly efficient alternative to Elasticsearch for many log‑search and analytical use cases, especially when aggregation performance is critical. However, Elasticsearch still offers richer query capabilities and a mature ecosystem for full‑text search, which may be required for more complex scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Search Engine Elasticsearch ClickHouse Vector performance comparison Docker Compose

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.