How We Built an Intelligent Data Warehouse on Alibaba Cloud MaxCompute
This article details the business background, technical challenges, and the step‑by‑step implementation of an intelligent data warehouse on Alibaba Cloud MaxCompute, covering offline data pipelines, metric calculation, data analysis, and future plans for data lake and AI‑driven analytics.
Business Background
Chuangjie Tong Information Technology Co., Ltd., founded in March 2010 under the Yonyou Group, focuses on digital transformation solutions. Key development stages include traditional software (2005‑2012), SaaS transformation (2013), breakthrough in intelligent finance (2014‑2016), intelligent commerce (2018‑2019), and a cloud‑native platform with the "Good Business Finance" product (2019‑present).
Rapid business expansion created six major data challenges: severe data silos, massive and complex data volume, high processing demands, need for serverless cloud‑native capabilities, strict data security and reliability, and low real‑time requirements.
Consequently, Alibaba Cloud MaxCompute (MC) was selected as the core data‑warehouse platform.
Technical Architecture
The current data‑warehouse architecture integrates data sources on the left with data applications on the right via both real‑time and offline pipelines. The offline pipeline uses DTS, DataHub, DataWorks, and DataX to ingest data into MC, where it is layered, processed, and abstracted for downstream consumption.
Case Study
Metric Calculation
Business systems generate data that is synchronized from PolarDB via DTS to DataHub, then connected to MC. Although the DataHub step could be omitted, it satisfies real‑time data needs for certain systems. After ingestion, daily log merging creates a raw layer, followed by detail, summary, and application layers, ultimately forming an ADS layer that supports real‑time warehouses such as StarRocks or Hologres.
The "Financial Advisor" product exemplifies metric calculation, offering internal and industry‑benchmark analyses across profitability, expense, cash flow, asset, and tax dimensions.
Profitability analysis
Expense analysis
Cash flow analysis
Asset analysis
Tax burden analysis
Data Analysis
Data analysis supports strategic decision‑making and business process improvement. By collecting user behavior via event tracking and combining it with user profiles, companies can optimize product design, personalize services, and refine marketing strategies. The workflow involves DataWorks, DataX, SLS logs, and DataHub to gather, clean, and compute data into wide tables and metrics, which are then stored in MC, written back to business databases, or loaded into real‑time warehouses.
North Star system – unified operational data
Darwin system – partner customer management
SCRM – customer relationship management
Delivery system – product delivery data
Open platform – third‑party integrations
Tag system – user profiling and personalization
Future Outlook
Data Lake Exploration
To handle growing data volume and complexity, the team will deepen data‑lake practices, exploring lake‑warehouse integration patterns such as "lake‑on‑warehouse", "warehouse‑on‑lake", and "big lake, small warehouse". Offline computing will continue to use MC, leveraging materialized views, while real‑time computing will rely on StarRocks.
Metric Platform
AI and large‑model technologies will be integrated to build an intelligent metric platform, using semantic models, attribution analysis, lineage, and impact analysis to automate business processes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
