How Semir Group Cut Costs 40% with MaxCompute, Hologres & DataWorks
Semir Group’s senior data manager explains how the company unified multiple legacy data warehouses onto Alibaba Cloud’s MaxCompute, Hologres, and DataWorks, achieving stable data production, improved quality, reduced ETL time, and cutting annual data platform costs from over three million to around 1.8 million yuan.
01 Lecturer Introduction
Jin Yinlong, Senior Manager of Data Warehouse at Zhejiang Semir Group, presents a case study on consolidating several self‑built data‑warehouse product systems into a unified platform using Alibaba Cloud MaxCompute, Hologres, and DataWorks, reducing annual warehouse expenses from over 3 million yuan to about 1.8 million yuan.
02 Company Overview
Semir, founded in 1996, focuses on young, fashionable, high‑value casual apparel. By 2023 it achieved revenue exceeding 100 billion yuan, operating multiple brands and subsidiaries. In 2010 the first brand was listed, and in 2014 the e‑commerce arm was created. From 2022‑2023 the company merged its data platforms for the listed group and e‑commerce.
03 Data‑Cloud Exploration
Initially the retail data of 3‑4 thousand stores was analyzed with SQL Server. Later, during the SAP implementation, an ERP‑linked data suite was used until 2015. Growing data volume led to a 12‑hour latency from extraction to presentation. In 2015 the team evaluated Hadoop, Spark, and commercial MPP databases, ultimately choosing SAP HANA for a period. By 2022, with the merger, the goal shifted to a cloud‑based commercial platform, rejecting open‑source solutions due to high migration and operational costs.
04 Legacy “Chimney” Architecture
By 2022‑2023 the data volume reached 15 TB across more than ten brands and twenty‑plus databases, feeding into multiple warehouses (CK, HANA, Oracle, Hologres). Three data‑flow chains existed: (1) sync to cloud A, Spark processing, then push to ClickHouse; (2) SAP HANA‑based retail orders synced to HANA and then to CK for analysts; (3) e‑commerce data synced via Hive + Impala. The fragmented pipelines caused frequent failures, requiring manual phone alerts via Airflow and DataWorks.
05 Data‑Middle‑Platform Goals
The main objectives were to unify the technology stack, adopt a commercial platform for reliable support, enable data‑lake capabilities (batch, real‑time, structured, unstructured), enforce data governance, and shorten the ETL chain to under 7 hours. Cost reduction from >3 million to ~1.8 million yuan was also targeted.
06 Solution: MaxCompute + Hologres + DataWorks
The final architecture uses MaxCompute for offline computation, Hologres as an OLAP engine (one primary + three replicas), and DataWorks as the unified development environment. Data flows from ODS (after STG) to DWD, then to DWS, and finally to ADS for BI and mobile queries. Hologres serves all analytical requests, providing isolation per department while sharing the same data source.
07 Construction Process
From 15 source systems, over 1 800 tables were extracted into the ODS layer (STG → ODS). Core sources include SAP HANA (finance, procurement, inventory) and various EMR systems. The e‑commerce side uses MySQL, feeding a CDM layer that models the full value chain (order, procurement, inventory, sales, member data). Six core modules (order, procurement, wholesale, inventory, retail, CDP) form the basis for 1 500+ historical tables and 500+ ADS tables serving digital stores and other tools.
08 Achievements
After a two‑month rollout (Dec 2023 – Feb 2024), nightly data incidents dropped from weekly to almost none, enabling store managers to rely on timely reports. ETL duration was reduced from >10 hours to ~6 hours, meeting the 7‑hour target. Consolidating the three warehouses eliminated redundant resources, cutting total cloud and big‑data costs to roughly 1.8 million yuan.
09 Future Outlook
Plans include a unified tech stack on the cloud, real‑time Flink models for dashboards, open data APIs for analysts, enhanced data‑service layers, stricter data‑quality governance, and integration of large‑model AI for conversational analytics. The roadmap also envisions expanding to a data lake handling semi‑structured and video/audio data, further enriching the enterprise data asset.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
