Big Data 10 min read

Case Study: Migrating to DorisDB for High‑Performance Query Engine at Kuayue Group

The article details how Kuayue Group's big‑data center replaced Presto and ClickHouse with DorisDB, achieving sub‑second query latency, simplifying architecture, and enabling both online and real‑time OLAP analytics across millions of daily requests.

DataFunTalk
DataFunTalk
DataFunTalk
Case Study: Migrating to DorisDB for High‑Performance Query Engine at Kuayue Group

Kuayue Group, a Chinese logistics unicorn founded in 2007, operates a big‑data center serving over 5,000 employees with more than 10,000 daily query calls, using DorisDB as a unified query engine to meet sub‑second response requirements.

Business Background : The original offline data warehouse routed data from MySQL databases through ETL clusters (Hadoop) to Hive, Spark, and Presto, then exposed a unified API gateway for front‑end tools and ERP systems.

Business Pain Points : Existing engines (Presto, Impala+Kudu, ClickHouse) could not meet the 1‑second TP99 target, leading to high latency, costly maintenance, and complex component integration.

OLAP Engine Selection : Three stages were undertaken – 1) Presto (2019) proved insufficient as data grew; 2) ClickHouse (2020) improved performance but required many wide tables and complex operations; 3) DorisDB (2021) offered superior single‑ and multi‑table join performance, MySQL protocol compatibility, easy deployment, and rich data import features.

DorisDB in Kuayue : The company gradually migrated analytical workloads to DorisDB, using it as the primary query engine for both online and OLAP scenarios.

Online Scenario Application : Daily query volume exceeds ten million. Prior to DorisDB, 8–9 different engines (Elasticsearch, MySQL, Presto, Impala+Kudu, etc.) were used. After migration, Presto and Impala+Kudu workloads were replaced, and DorisDB now handles the majority of queries.

Online Case Study : A 200‑field wide‑table query with a 600‑line SQL was moved from a 5‑node Presto cluster to a 10‑node DorisDB cluster, reducing latency from 5.7 seconds to 1 second—approximately six times faster.

OLAP Scenario Application : The in‑house BI platform switched its backend from Presto to DorisDB, delivering noticeable performance gains and enabling broader user adoption across the group.

Offline Analysis Performance : In a customer‑centric offline analysis use case, TP99 dropped from 4.5 seconds to 1.7 seconds, achieving roughly three‑fold speedup, with most queries now returning within one second.

Real‑Time OLAP : By capturing binlog changes into Kafka and loading them into DorisDB via routine load, the company transformed a two‑hour batch update into a near‑real‑time pipeline, allowing second‑level data freshness and greater analytical flexibility.

Future Plans : Deploy multiple isolated DorisDB clusters for workload isolation, migrate remaining Hive‑external queries, implement lightweight ETL jobs on DorisDB, and enable the CBO optimizer for further performance improvements.

Finally, Kuayue thanks DorisDB provider DingShi for delivering a high‑performance, feature‑rich query engine and ongoing technical support.

performanceBig Datareal-time analyticsData WarehouseOLAPQuery EngineDorisDB
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.