How Alibaba Built Its Full‑Domain Data Platform: Architecture & Lessons
In this detailed account, Alibaba senior technologist Zhang Lei explains the concept of a full‑domain data platform, its “four‑horizontal and three‑vertical” architecture, the OneData ecosystem, cost‑saving strategies, data quality tools, and practical challenges of building and operating massive big‑data infrastructure across the Alibaba ecosystem.
Data Middle Platform
In 2016 Alibaba Group introduced the concept of a middle platform. The Data Technology and Product Department is responsible for the group's data middle platform, whose core is building full‑domain big data. What exactly is full‑domain data and how is it built?
At the Alibaba Cloud Conference big data forum, senior technical expert Zhang Lei gave a detailed answer. Below is the full transcript.
Alibaba Data Middle Platform Positioning
Alibaba's data middle platform manages the group's most critical base data. Technically, it covers every link from data collection, processing, to data services and applications, providing end‑to‑end data services for businesses, users, and SMEs within the Alibaba ecosystem. For example, the dazzling data screen on Double 11 is handled by this department.
The architecture presents a “four‑horizontal and three‑vertical” structure, with the underlying infrastructure from Alibaba Cloud.
Four Horizontals
From bottom to top: data collection and ingestion by business lines (Taobao, Tmall, Hema, etc.) are extracted to the computing platform; then OneData builds a public data center based on “business segment + analysis dimension”. On top of the public data center, consumer, enterprise, and content data systems are built, processed deeply, and delivered to products and businesses via the unified data service middleware OneService.
Alibaba's internal data products number in the dozens, serving tens of thousands of employees daily; the unified data product platform “Business Advisor” serves over 20 million merchants.
Three Verticals
To ensure fast, efficient, high‑quality data ingestion at this scale, Alibaba uses an intelligent data development platform with a suite of tools and processes, guaranteeing standardized data construction across teams and business units. A unified data quality management platform addresses cost concerns as data volume grows.
What Is Full‑Domain Data?
Alibaba's ecosystem includes core e‑commerce (Taobao, Tmall, Juhuasuan), media (Youku, Tudou, UC Browser), local services (Koubei, Ele.me), as well as Ant Group, Cainiao, Alibaba Mama, Alibaba Cloud, etc. All these data are centrally stored and managed, forming the scope of full‑domain data.
Each business line provides data sources, and high‑quality data is parsed, processed, and fed back to drive business, aiming to use full‑domain data to create greater value.
Why Build Full‑Domain Data?
1. Reduce cost: Consolidating data infrastructure reduces hardware, network, and software expenses. For example, after integrating Youku‑Tudou (YouTu) into Alibaba's platform, data construction cost dropped to less than 50% of the original.
2. Technical empowerment: Rapid data system migration enables ecosystem companies to gain Alibaba‑level big‑data capabilities.
3. Data connection: Eliminating data silos by connecting data across the ecosystem.
4. Business enablement: Unified data allows faster, more accurate decision‑making, rapid experimentation, and lowers innovation barriers.
How to Build Full‑Domain Data?
Challenges include geographic data migration, continuous service during migration (“changing wheels on a moving plane”), and long project cycles.
Infrastructure: Alibaba’s years of e‑commerce experience have built robust data centers, networks, servers, middleware, computing platforms, data platforms, and algorithm platforms.
First step is to integrate ecosystem company data at the infrastructure level.
Data components: bottom layer – data collection; middle – compute and storage platforms (real‑time computing with Blink, offline with MaxCompute).
Data is collected from user web behavior (PC and wireless), fed into real‑time and offline platforms, which provide programmable capabilities (SQL, Graph). On top are development tools, product services, and BI tools.
With strong infrastructure, many developers (≈20,000) and about 10,000 daily users work on the data platform.
Key Systems
Flow system: Alibaba’s traffic distribution center (e.g., Taobao) collects traffic data using a “Super Position Model” with hierarchical page‑block‑position tracking, enabling comprehensive flow analysis.
Computation Componentization
Engineering capabilities allow configuration rather than custom code for many requirements, improving reuse and simplifying integration of new businesses.
OneData System
The OneData system underpins full‑domain data construction, covering data ingestion, definition, processing, validation, and stability, forming the end‑to‑end data development workflow.
Tools such as OneClick automate data ingestion; OneDefine enforces naming and modeling standards; SQLscan checks code quality, performance, and compliance; “On the Other Side” provides regression testing for data changes.
With data volumes exceeding exabytes and over a million tables, Alibaba continues to explore breaking traditional data‑warehouse ETL architectures, focusing on compute‑storage separation and hybrid online‑offline processing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
