Evolution of OLAP at Xingyun Retail Credit Using Apache Doris
This article details how Xingyun Retail Credit transitioned from traditional data warehouses to an Apache Doris‑based OLAP solution, covering data demand generation, OLAP engine selection challenges, multi‑stage implementation, performance optimizations, data‑warehouse construction, real‑world use cases, and future roadmap.
As business scale and data volume grew, traditional data warehouses could no longer meet Xingyun Retail Credit's analytical needs, prompting the team to explore an OLAP solution based on Apache Doris for more efficient and accurate data processing.
The evolution is divided into five parts: data demand generation, OLAP engine selection dilemmas, Apache Doris practice, later planning, and a Q&A session.
Initially, multiple OLTP systems (MySQL, Oracle, PostgreSQL) created data silos, making end‑to‑end queries impossible; an AP system was needed, but selecting the right OLAP engine was challenging.
Three implementation stages were described: (1) using Kettle for offline ETL, which lacked real‑time query capability; (2) evaluating Trino for heterogeneous data federation, encountering memory and point‑query performance issues; (3) adopting Apache Doris, which offered high‑performance, ISO‑SQL compatibility, storage‑compute integration, and seamless migration from MySQL.
With Doris, the team accelerated concurrent queries by choosing appropriate models, partitioning, and colocation joins, achieving millisecond‑level point queries and high‑throughput reporting, especially for risk‑control and credit‑scoring workloads.
A data‑warehouse foundation was built on Doris using Dolphin Scheduler, DataX, JDBC catalog, and Flink CDC for both batch (T+1) and near‑real‑time ingestion, complemented by Grafana‑Prometheus‑Loki monitoring.
Several business scenarios were implemented: a risk‑control reporting platform serving hundreds of reports, large‑scale log storage and analysis (using JSONB), and real‑time data collection via a custom Flume sink with Doris StreamLoad.
The post‑implementation benefits include reduced operational overhead thanks to Doris's self‑healing capabilities, significant performance gains (up to 4× faster queries), lower storage costs, and faster data refresh for dashboards.
Future plans involve developing an intelligent data gateway for heterogeneous source integration, unified data archiving on Doris, and further real‑time ETL pipelines.
The Q&A addressed log‑query performance (millisecond‑level with fuzzy matching), comparison with ELK, refresh intervals for risk‑control dashboards, and high‑availability mechanisms based on Doris's internal replication.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.