Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption
The article chronicles Lenovo Liancheng Zhida’s three‑stage evolution of OLAP engines—from early SQL Server scripts, through a Hadoop‑based Presto solution, to the adoption of DorisDB—detailing architecture, tool comparisons, implementation practices, and the performance and operational benefits achieved.
Lenovo Liancheng Zhida, a subsidiary of Lenovo Group, builds a smart logistics ecosystem and introduced various OLAP engines to meet growing data demands. DorisDB, a fully vectorized MPP database, was selected for its strong performance and used to create a unified data service platform, reducing development complexity and improving BI efficiency.
OLAP Engine Evolution at Lenovo Liancheng Zhida
Stage 1 (pre‑2018) : Data volume was modest; a traditional relational database (SQL Server) was used, and data‑warehouse concepts were absent. Data needs were satisfied by hand‑written SQL scripts, but rapid business growth caused query latency to increase from minutes to hours and storage capacity to become a bottleneck.
Stage 2 (2019) : A Hadoop/Hive data warehouse was built, moving all ETL jobs to a Hadoop cluster. Dozens of Presto nodes handled OLAP analysis, sharing metadata with Hive and using the same physical storage. Tableau connected directly to Presto for BI visualisation.
Stage 3 (2021) : To support low‑latency BI reports, ad‑hoc complex queries, and high‑throughput detail queries, the team introduced DorisDB. DorisDB combined Presto‑like ad‑hoc multi‑table queries with ClickHouse‑level single‑table performance, enabling fast BI analytics.
Data Analysis Architecture
The system consists of data collection, storage & computation, query & analysis, and application layers. Data is ingested via Sqoop, Flume, and web crawlers into Hive. Offline processing uses Hive for ETL, while the query layer writes results to RDBMS or MPP databases for Tableau, fixed reports, and ad‑hoc analysis. The application layer serves management and operations dashboards with low‑latency requirements.
OLAP Tool Comparison
ClickHouse – strengths: excellent single‑table performance, rich MergeTree families, good for massive log ingestion. Weaknesses: no true delete/update, limited join capabilities, lower concurrency, incomplete MergeTree merges.
DorisDB – strengths: strong performance for both single‑ and multi‑table queries, high concurrency, real‑time micro‑batch ETL, robust streaming and batch writes, MySQL‑compatible protocol. Weaknesses: limited large‑scale ETL capacity and imperfect resource isolation.
DorisDB in the SEC Data Center
The SEC (Channel Warehouse Management) core data comes from consumer and SMB businesses. Before DorisDB, the solution relied on many Hive jobs, with data stored in Hive, MySQL, or SQL Server, and Presto for BI queries, which suffered from slow response times.
Key technical pain points were fragmented data logic and slow Presto performance on complex Tableau reports. The new requirements demanded high‑throughput writes, sub‑100 ms multi‑dimensional queries, strong multi‑table joins, and single‑table queries on billions of rows within 100 ms.
After evaluating options, DorisDB was chosen for its efficient query engine, clear architecture, and ability to serve as the final data‑business logic layer. Benefits include unified data standards, distributed MPP aggregation, and good Tableau compatibility.
Solution Based on DorisDB
Data model design uses detailed tables as the primary model, with partitioning and bucketing to accelerate historical inventory queries. Materialized views are built on various grain levels (item SN, product type, warehouse, distributor) to speed up queries.
Data import uses two methods: (1) Broker Load to import Hive tables into DorisDB, and (2) DataX to load data from SQL Server and MySQL.
DorisDB Outcomes
• Flexible modeling improved development efficiency, combining wide tables and star schemas, and exposing MySQL tables as external DorisDB tables to avoid data migration. • BI experience became excellent: Tableau dashboards load instantly after migrating data to DorisDB via DataX and DorisDB‑Writer. • Operational cost lowered: DorisDB’s simple FE/BE architecture with data replication ensures high availability and online elastic scaling without downtime.
Conclusion
Since April 2021, DorisDB has replaced a large Presto cluster with a quarter of the resources, providing unified data services, simplifying offline pipelines, and meeting strict latency requirements. The team thanks DingShi Technology and expects DorisDB to continue leading as a high‑performance MPP database.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.