Big Data 19 min read

Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons

This article explains how Beike adopted the Druid OLAP engine to handle massive real‑time and offline query workloads, detailing its four‑component architecture, key technologies such as deep storage and metadata storage, practical optimizations for data ingestion, query caching, dynamic throttling, timeout control, and a roadmap for future enhancements.

dbaplus Community

Jul 11, 2021

Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons

Background

Beike generates massive real‑time and offline data across many business lines. Efficient query and analysis are essential for product development and business expansion. Single‑technology solutions could not meet diverse query requirements, leading the team to adopt Apache Druid as the OLAP engine.

Druid Architecture and Key Technologies

Architecture

Druid consists of four main components:

Master Servers – run the Coordinator and Overlord processes to manage data availability and ingestion.

Query Servers – run Broker (and optional Router) processes to handle external client queries.

Data Servers – run Historical and MiddleManager processes to ingest workloads and store all queryable data.

External Dependencies – Deep Storage, Metadata Storage, and Zookeeper provide shared object storage, system metadata, and service discovery/leader election.

External Dependencies

Deep Storage is a shared object store (e.g., S3, HDFS, or a network‑mounted filesystem) used for raw data backup and inter‑process transfer. Historical nodes read pre‑materialized segments from local disks for fast query response, so sufficient disk space must be provisioned both in Deep Storage and on Historical nodes.

Metadata Storage holds segment availability, task information, and other system metadata, typically in a relational database such as PostgreSQL or MySQL.

Zookeeper provides service discovery, coordination, and leader election, enabling large‑scale cluster expansion and fault tolerance.

Key Technologies

Druid achieves high‑performance queries through pre‑aggregations, bitmap encoding, and inverted indexes. An example dataset contains four fields: house_type (0 = commercial, 1 = public, 2 = affordable), city_name, look_cnt, and trade_cnt. The following SQL demonstrates a typical aggregation:

SELECT city_name, SUM(look_cnt)
FROM table_sample
WHERE house_type = 0 OR house_type = 1
GROUP BY city_name;

The query flow traverses the Broker, Historical nodes, and (if needed) Deep Storage.

Practice and Optimizations

Offline Scenario

The production Druid cluster runs on more than 20 nodes, handling >30 million queries per day with peak QPS around 1.2 k. It serves 18+ business lines (brand, store operations, overseas, renovation dashboards, etc.).

Data Source Ingestion Optimization

Data sources are Hive tables split into full‑load (year‑scale) and incremental (daily) tables. Ingestion uses Hadoop index jobs. To accelerate large incremental partitions, the team increases the number of map containers by generating more Parquet files and sets numShards to increase reduce containers. Example: for 100 million low‑cardinality rows, the data is repartitioned into 20 shards and numShards=5, reducing ingestion time noticeably.

Shard Merging

After ingestion, each segment may contain multiple shards (recommended size 300‑700 MB). When shards are too small, they are compacted to reduce CPU waste on Historical nodes. In practice, most segments stay under 500 M rows, so the team limits each segment to 500 M rows before merging.

Engineering Implementation

An automated pipeline adjusts ingestion steps (e.g., map/reduce container count, shard merging) based on data size and business line, ensuring optimal resource usage.

Cluster Load Protection

Query Cache – a three‑layer cache (metric‑level API cache, queryEngine cache, and Druid segment cache) reduces load on Brokers and Historical nodes.

Dynamic Throttling – the Broker monitors Historical CPU load and applies tiered throttling based on business importance; limits are released automatically when load drops.

Timeout Control – the default Druid query timeout (5 min) is reduced to ~15 s in production. The team extended QueryResource to adjust timeout dynamically according to Historical CPU usage, improving latency during peaks.

Real‑Time Scenario

Real‑time deduplication is required for metrics such as GMV and GSV. Native Druid lacks efficient real‑time deduplication, so the team implemented a Redis‑backed dictionary in Druid 0.18.1.

Two encoding strategies were evaluated: a global auto‑increment ID stored in Redis (rejected due to high QPS pressure) and Snowflake‑generated IDs (chosen). Each Kafka index job runs a local Snowflake ID generator; concurrency is limited to 1024 nodes because Snowflake uses a 10‑bit machine identifier.

A new metric type commonUnique stores values in a 64‑bit Roaring64NavigableMap (instead of the 32‑bit RoaringBitmap), allowing >2 billion distinct values.

Key classes added: CommonUniqueComplexMetricSerde – registers the new metric type and encodes values into Redis. CommonUniqueAggregator – implements aggregation logic (buffer and non‑buffer versions). CommonUniqueAggregatorFactory – creates aggregators for query planning.

Ingestion configuration example:

{
  "metricsSpec": [
    {"type": "count", "name": "count"},
    {"type": "commonUnique", "name": "date_agent_ucid_unique", "fieldName": "date_agent_ucid"},
    {"type": "commonUnique", "name": "date_open_id_unique", "fieldName": "date_open_id"}
  ]
}

Query example (groupBy):

{
  "queryType": "groupBy",
  "dataSource": {"type": "table", "name": "user_action_sa_cube_v1"},
  "intervals": {"type": "intervals", "intervals": ["2020-09-04T00:00:00.000Z/2020-09-07T00:00:00.000Z"]},
  "granularity": {"type": "all"},
  "dimensions": ["entity_id"],
  "aggregations": [{"type": "commonUnique", "name": "date_open_id_unique", "fieldName": "date_open_id_unique"}]
}

Platformization

The team built a platform named vili on top of Druid to provide a unified OLAP service. The platform offers six capabilities:

Data‑warehouse modeling for offline (Hive) and real‑time (Flink → Kafka) sources.

Cube modeling inspired by Kylin, enabling automatic scheduling for offline cubes and Kafka index jobs for real‑time cubes.

Metric definition and enrichment, abstracting engine‑specific details.

Unified query API that hides differences between underlying engines.

Tools for metadata refresh, data back‑fill, and cross‑engine comparison.

Operational dashboards for metadata, task status, and logs.

Future Plans

Deepen real‑time metric support in Druid, potentially adding update capabilities to compete with ClickHouse and Doris.

Accelerate offline data ingestion by exploring Spark‑based pipelines and parallel index imports with global dictionary support.

Optimize query performance for high‑cardinality dimensions by creating dedicated pre‑aggregation indexes and routing queries to the most appropriate index.

Continue platformization by moving engine‑specific metadata, data‑source management, and API rate‑limiting to the platform layer, thereby decoupling business lines from the underlying engines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data real-time analytics OLAP Druid

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.