Big Data 12 min read

Druid OLAP Platform Practice at YouZan: Architecture, Features, and Challenges

YouZan adopted MetaMarket’s Druid OLAP platform—featuring millisecond‑level interactive queries, high availability, horizontal scalability, and rich SQL/API query types—by configuring simple ingestion tasks that automatically manage real‑time and batch data, tiered hot/cold storage, and monitoring, while still facing ingestion limits, lack of joins, and occasional latency spikes.

Youzan Coder

Feb 13, 2019

Druid OLAP Platform Practice at YouZan: Architecture, Features, and Challenges

This article introduces Druid, a high-performance OLAP (Online Analysis Processing) data storage and analysis system developed by MetaMarket, now incubating at Apache Foundation. Druid is designed for massive datasets and offers sub-second query response times.

Druid's Key Features:

Interactive Query: Millisecond-level query latency after events are created, thanks to columnar storage and optimized query execution that reads only necessary data

High Availability: Uses HDFS/S3 for deep storage with segments replicated across 2 Historical nodes; supports multi-replica data ingestion

Horizontal Scalability: All deployment components can scale horizontally to improve data ingestion and query performance

Parallel Processing: Enables parallel query processing across the entire cluster

Rich Query Capabilities: Supports Scan, TopN, GroupBy, Approximate queries via both API and SQL

Why YouZan Uses Druid: As a SaaS company with massive real-time and offline data, YouZan previously used SparkStreaming/Storm for OLAP scenarios, requiring separate real-time tasks and carefully designed storage. With Druid, developers only need to fill in a data ingestion configuration specifying dimensions and metrics. A real-time task can be created in about 10 minutes.

Druid Architecture (Lambda Architecture):

Coordinator: Manages cluster segments and load balancing

Overlord: Accepts tasks, coordinates task distribution, and collects task status

MiddleManager: Receives indexing tasks from Overlord and creates Peon instances to execute them

Broker: Receives query requests from clients and forwards them to Historical and MiddleManager nodes

Historical: Loads non-real-time window segments

Router: Optional API gateway for improved concurrent query capabilities

YouZan OLAP Platform Features:

DataSource Management

Tranquility Configuration and Instance Management: Automatically deploys Tranquility instances based on traffic and server resources

Data Compensation: Solves late-arriving data issues through offline batch processing

Druid SQL Query Integration

Monitoring and Alerting: Includes basic monitoring (service, cluster status, machine info) and business monitoring (real-time tasks, TPS, latency, RT/QPS)

Hot/Cold Data Separation: Using Historical Tier grouping and load rules - recent 30-day data on SSD (hot tier), 180-day data on SATA (default tier), older data dropped from Historical but retained in HDFS.

Challenges and Future Work:

Data Ingestion: Tranquility has limited ingestion methods and lacks monitoring capabilities

Druid doesn't support JOIN queries across multiple DataSources

Hourly query RT spikes when new Index tasks are created

Historical data automatic Roll-Up for storage optimization

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Platform OLAP Druid Apache Druid Lambda architecture Youzan

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.