Databases 15 min read

Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance

This article describes how Financial One Account, a technology service arm of Ping An, migrated from a Hadoop‑Presto‑Kylin stack to an Apache Doris‑based data platform, detailing the architectural evolution, OLAP engine selection, metric system design, performance optimizations, and future roadmap for real‑time analytics.

DataFunTalk

Jul 25, 2023

Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance

Financial One Account, a technology‑as‑a‑service provider of the Ping An Group, faced inconsistent metric definitions, duplicated calculations, and low delivery efficiency in traditional reporting. To address these pain points, it built an integrated metric data service platform based on Apache Doris, aiming to centralize metric construction, reduce ETL effort, and achieve millisecond‑level query responses.

The first‑generation architecture (Hadoop + Presto + Apache Kylin) relied on pre‑computed cubes, which proved insufficient for flexible analysis, high‑performance queries, and low operational cost. After evaluating several OLAP engines, Apache Doris was selected for its MySQL compatibility, distributed join capabilities, materialized view support, and simple two‑role (FE/BE) deployment.

In the second‑generation architecture, Doris replaced Kylin and Presto, using a Duplicate‑Key model with range‑partitioned tables to store detailed data. The MPP engine enabled high concurrency and low‑latency processing, supporting complex multi‑table joins and achieving query times as low as 63 ms in large‑scale scenarios.

Metric construction follows a three‑tier model: atomic metrics (raw fields), derived metrics (e.g., YoY, MoM), and derived metrics (custom calculations). The platform provides metadata management, dimension selection, and publishing workflows, complemented by AI‑driven anomaly detection, root‑cause analysis, and value‑scoring for metric governance.

Performance tuning involved data preparation (stream load of CSV files), cluster monitoring with Prometheus and Grafana, and two optimization schemes: colocation join and wide‑table redesign. The wide‑table approach, combined with SQL caching, reduced most query latencies to single‑digit milliseconds, achieved 300 TPS under concurrency, and saved over 30 % of ETL development effort.

Future work includes real‑time analysis with Flink CDC and Apache Iceberg, materialized view enhancements for multi‑table joins, and migration of other middle‑platform products (e.g., label platform) to Doris.

Architecture 1.0: Hadoop + Presto + Apache Kylin – limited flexibility, performance, and high maintenance cost.

Architecture 2.0: Apache Doris – high‑performance OLAP, simple deployment, and low operational overhead.

Key benefits: millisecond‑level query response, reduced ETL workload, unified metric governance, and scalable real‑time analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data data-warehouse OLAP Apache Doris

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.