Big Data 24 min read

How Semir Group Cut Costs 40% with MaxCompute, Hologres & DataWorks

Semir Group’s senior data manager explains how the company unified multiple legacy data warehouses onto Alibaba Cloud’s MaxCompute, Hologres, and DataWorks, achieving stable data production, improved quality, reduced ETL time, and cutting annual data platform costs from over three million to around 1.8 million yuan.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Semir Group Cut Costs 40% with MaxCompute, Hologres & DataWorks

01 Lecturer Introduction

Jin Yinlong, Senior Manager of Data Warehouse at Zhejiang Semir Group, presents a case study on consolidating several self‑built data‑warehouse product systems into a unified platform using Alibaba Cloud MaxCompute, Hologres, and DataWorks, reducing annual warehouse expenses from over 3 million yuan to about 1.8 million yuan.

02 Company Overview

Semir, founded in 1996, focuses on young, fashionable, high‑value casual apparel. By 2023 it achieved revenue exceeding 100 billion yuan, operating multiple brands and subsidiaries. In 2010 the first brand was listed, and in 2014 the e‑commerce arm was created. From 2022‑2023 the company merged its data platforms for the listed group and e‑commerce.

03 Data‑Cloud Exploration

Initially the retail data of 3‑4 thousand stores was analyzed with SQL Server. Later, during the SAP implementation, an ERP‑linked data suite was used until 2015. Growing data volume led to a 12‑hour latency from extraction to presentation. In 2015 the team evaluated Hadoop, Spark, and commercial MPP databases, ultimately choosing SAP HANA for a period. By 2022, with the merger, the goal shifted to a cloud‑based commercial platform, rejecting open‑source solutions due to high migration and operational costs.

04 Legacy “Chimney” Architecture

By 2022‑2023 the data volume reached 15 TB across more than ten brands and twenty‑plus databases, feeding into multiple warehouses (CK, HANA, Oracle, Hologres). Three data‑flow chains existed: (1) sync to cloud A, Spark processing, then push to ClickHouse; (2) SAP HANA‑based retail orders synced to HANA and then to CK for analysts; (3) e‑commerce data synced via Hive + Impala. The fragmented pipelines caused frequent failures, requiring manual phone alerts via Airflow and DataWorks.

05 Data‑Middle‑Platform Goals

The main objectives were to unify the technology stack, adopt a commercial platform for reliable support, enable data‑lake capabilities (batch, real‑time, structured, unstructured), enforce data governance, and shorten the ETL chain to under 7 hours. Cost reduction from >3 million to ~1.8 million yuan was also targeted.

06 Solution: MaxCompute + Hologres + DataWorks

The final architecture uses MaxCompute for offline computation, Hologres as an OLAP engine (one primary + three replicas), and DataWorks as the unified development environment. Data flows from ODS (after STG) to DWD, then to DWS, and finally to ADS for BI and mobile queries. Hologres serves all analytical requests, providing isolation per department while sharing the same data source.

07 Construction Process

From 15 source systems, over 1 800 tables were extracted into the ODS layer (STG → ODS). Core sources include SAP HANA (finance, procurement, inventory) and various EMR systems. The e‑commerce side uses MySQL, feeding a CDM layer that models the full value chain (order, procurement, inventory, sales, member data). Six core modules (order, procurement, wholesale, inventory, retail, CDP) form the basis for 1 500+ historical tables and 500+ ADS tables serving digital stores and other tools.

08 Achievements

After a two‑month rollout (Dec 2023 – Feb 2024), nightly data incidents dropped from weekly to almost none, enabling store managers to rely on timely reports. ETL duration was reduced from >10 hours to ~6 hours, meeting the 7‑hour target. Consolidating the three warehouses eliminated redundant resources, cutting total cloud and big‑data costs to roughly 1.8 million yuan.

09 Future Outlook

Plans include a unified tech stack on the cloud, real‑time Flink models for dashboards, open data APIs for analysts, enhanced data‑service layers, stricter data‑quality governance, and integration of large‑model AI for conversational analytics. The roadmap also envisions expanding to a data lake handling semi‑structured and video/audio data, further enriching the enterprise data asset.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HologresAlibaba CloudETL optimization
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.