Why eBay Switched Its Ad Analytics from Druid to ClickHouse – A Deep Dive
eBay’s ad data platform, originally built on a custom SQL engine and later migrated to Druid, was re‑engineered to use ClickHouse, highlighting challenges such as massive data volume, atomic offline replacements, schema design, compression, and operational simplifications, and demonstrating performance and scalability gains for advertisers.
Introduction
eBay’s first‑party advertising platform provides sellers with real‑time and historical metrics on traffic, conversions, and revenue. The platform originally used a custom distributed SQL engine, migrated to Druid three years ago, and later moved to ClickHouse to satisfy growing data volume and operational requirements.
Background
The system must ingest hundreds of billions of rows per day with peak insert rates approaching 1 million rows per second . Financial reports require sub‑10‑second end‑to‑end latency and strict data integrity. Offline data for the most recent 1‑2 days must be replaceable without disrupting queries, which demands globally atomic operations across all ClickHouse nodes.
Druid vs. ClickHouse
Druid is an open‑source columnar OLAP engine that offers high availability, horizontal scaling, and rapid data‑source configuration. Its architecture relies on six node types (Overlord, Coordinator, Middle Manager, Indexer, Broker, Historical) and external services such as MySQL, ZooKeeper, and HDFS.
ClickHouse, developed by Yandex, is a full DBMS with native SQL support, columnar storage, high compression (up to 10×), vectorized query execution, and a simple peer‑to‑peer node design. Replication is handled via ZooKeeper when the Replicated* engines are used.
Why Migrate
Operational complexity : Druid’s six node roles and external dependencies increase maintenance overhead.
Ingestion limits : Druid’s real‑time indexing creates immutable segments, imposing a 3‑hour write window after which data cannot be updated.
Atomic offline replacement : ClickHouse allows per‑partition DETACH / ATTACH / REPLACE, enabling transparent data swaps without extra merge logic.
System Architecture
The ClickHouse‑based architecture consists of four logical components:
Real‑time data acquisition from eBay’s behavior and transaction message streams.
Offline data replacement that ingests cleaned data from the internal data‑warehouse.
ClickHouse cluster with surrounding services (ZooKeeper, MySQL for metadata, monitoring).
Reporting layer exposing internal and public APIs.
Schema Design
Table engine : Use ReplicatedSummingMergeTree for high‑volume impression/click tables (≈60 % storage reduction) and ReplicatedMergeTree for sales tables that require non‑summarizing aggregations.
Primary key / sorting key : Align the ORDER BY clause with seller_id. In ClickHouse the primary key is a sorting key, not a uniqueness constraint, which improves compression and query pruning for seller‑centric workloads.
Compression : Default LZ4; optional LZ4HC or ZSTD for higher ratios. Low‑cardinality columns should use the LowCardinality type to further shrink storage.
Offline Data Replacement
Daily offline data approaches 1 TB. The replacement workflow uses ClickHouse partitions as independent units:
Create a temporary partition for the target date.
Run Spark jobs (submitted via Livy) that aggregate raw offline files, shard them according to the partition topology, and write the shards to Hadoop.
Lock the partition topology through a service API to prevent concurrent topology changes.
After Spark finishes, compute checksums and row counts for each shard.
Call the ClickHouse REPLACE PARTITION API: the temporary partition atomically replaces the target partition.
Spring Batch orchestrates per‑day sub‑tasks, guaranteeing that only one task processes a given date at a time and allowing manual re‑processing of historical ranges.
Data Query
ClickHouse exposes SQL over HTTP and TCP. Client libraries exist for JDBC, command‑line, and visualization tools. By default a query uses half of the node’s CPU cores; high concurrency can saturate resources, so query size and concurrency are throttled.
The internal API routes queries through a thread pool sized to the ClickHouse cluster. The public API queues queries, persists results in a relational DB, and serves downloadable reports.
Atomicity and Consistency Guarantees
During replacement a temporary partition is created; after successful checksum verification the temporary partition replaces the target partition in a single atomic operation. Each partition carries a version number; queries always read a single version, preventing mixed‑old/new reads.
SQL statements include a WITH clause that injects the current partition version and a PREWHERE filter to exclude stale data.
Testing and Rollout
After deployment, a dual‑write mode inserted both real‑time and offline data into ClickHouse until parity with Druid was reached. Validation involved mirroring production queries, comparing ClickHouse responses against Druid, and gradually shifting traffic to ClickHouse while keeping Druid as a fallback.
Monitoring tracks ingestion latency, query latency, CPU/memory usage, and checksum mismatches to detect regressions early.
Conclusion
The migration to ClickHouse delivered lower operational complexity, higher ingestion throughput (no 3‑hour write window), atomic offline data swaps, and up to 60 % storage savings for high‑volume event tables. The platform now serves real‑time and historical ad analytics at eBay with sub‑10‑second latency and strong consistency guarantees.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
