Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao
The article describes how Yuanfudao's data middle platform built a high‑performance OLAP service using the MPP HOLAP engine DorisDB to unify real‑time and batch analytics, meet low‑latency and high‑concurrency requirements, and support diverse education‑industry use cases such as live‑stream monitoring, advertising, and order analytics.
Yuanfudao's data middle platform provides standardized data sets (OneData) and unified services (OneService) for multiple education products, requiring a reliable OLAP platform that can serve both real‑time and offline queries.
Business Background and Requirements
The platform must handle massive daily data, support metrics like user activity, order revenue, channel conversion, and renewal rates, and provide low‑latency, real‑time insights. It needs to ingest both streaming and batch data, support complex multi‑table joins, high concurrency, and easy‑to‑use SQL.
OLAP Engine Requirements
Second‑ or millisecond‑level query latency.
Efficient handling of wide tables and multi‑table joins.
High‑concurrency support.
Streaming and batch data ingestion.
Standardized SQL with low learning cost.
Accurate deduplication.
Scalable online expansion with low ops cost.
Technology Selection and Comparison
The team evaluated MOLAP (e.g., Druid, Kylin), ROLAP (e.g., Presto, ClickHouse), and HOLAP solutions. MOLAP offers pre‑aggregation but lacks flexibility; ROLAP is flexible but can be unstable for complex queries. HOLAP combines both advantages, with DorisDB emerging as the best fit due to strong performance, MySQL compatibility, and low operational overhead.
Application Scenarios
Real‑time Live‑Stream Quality Monitoring : Minute‑level metrics such as network quality, packet loss, and audio/video availability are served by DorisDB.
Offline Interactive Queries and BI Reports : Migrating from MySQL to DorisDB reduced query latency by several orders of magnitude and simplified JDBC integration.
Near‑real‑time Order and Renewal Data : Hive historical data and binlog streams are ingested via Flink SQL into DorisDB, enabling fast cross‑team analytics.
Real‑time Advertising Strategy : Minute‑level ad performance data is streamed into DorisDB for unified reporting.
Monitoring and Operations
Key cluster health metrics (FE/BE node loss, disk failures, CPU usage, memory pressure) and query‑level alerts (large scans, slow queries >2 min, connection spikes) are tracked. An audit platform captures DDL operations and slow queries, feeding logs into Elasticsearch for analysis.
Ecosystem Integration
Custom Flink connectors, Stream Load/Broker Load pipelines, and a Presto‑DorisDB catalog were built to enable cross‑source queries and seamless data ingestion.
Future Outlook
Planned extensions include bitmap‑based multi‑dimensional analysis, a generic event‑analysis platform, and further automation of cluster scaling and upgrades.
Conclusion
By adopting DorisDB, Yuanfudao achieved a unified streaming‑batch OLAP engine that delivers low‑latency, high‑throughput analytics, strengthens the OneData/OneService ecosystem, and provides a solid foundation for future data‑platform evolution.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.