How Alibaba Built an EB-Scale, Real-Time Big Data Platform
Alibaba’s senior data expert Yao Bin Hui explains how the company constructed a standardized, end-to-end big-data ecosystem—from low-level data collection and AI algorithms to data services and product platforms—enabling petabyte-scale integration and second-level response times that power both internal operations and millions of external users.
Overview of Alibaba’s Big Data Ecosystem
Alibaba’s ecosystem spans six business sectors—e‑commerce, logistics, health, entertainment, finance, and cloud computing—serving billions of users and accumulating data at the exabyte level.
Full‑Domain Data System
The bottom layer is full‑domain data, followed by foundational data construction that includes AI algorithms, then data services and the data product development platform.
Relationship Between Data Services and Data Product Platform
Data services and the data product platform are interdependent like gears; they jointly transform processed data and algorithms into applications that empower businesses and users.
Core Data Service Capabilities
Basic data service: provides cross‑domain access to tens of thousands of metrics.
Tag/portrait service: offers hundreds of user tags.
Audience insight service: refines tags for marketing scenarios such as audience segmentation.
Algorithm model service: exposes AI models as services.
These services are accessed via a portal that discovers available metrics and usage by various business units.
Data Service Architecture
Data Service architecture includes a DSL layer that standardizes data access, with core components such as QueryEngine (queries), PushEngine (real‑time push), DAG visual orchestration, and Algorithm Engine (AI capabilities).
Key Application Scenarios
Internal Alibaba use cases such as search, recommendation, and marketing.
Data dashboards (e.g., Double 11 live screens).
Commercial data products like “Business Advisor” for merchants.
Data Product Development Platform
The platform enables non‑technical users to build data products, offering four main capabilities: data analysis, self‑service report configuration, product configuration for non‑developers, and advanced custom development for developers.
Its architecture consists of an application side and a service side, both driven by DSL definitions. The DSL describes applications and services, which are executed by rendering and execution engines, allowing deployment across PC, mobile, and large‑screen devices.
Benefits of Combining Data Services and Platform
Breaks data silos between business lines, allowing data to flow like water across domains.
Meets diverse and changing data requirements.
Provides full‑domain circulation and on‑demand self‑service without needing professional developers.
Primary Users
Business operations staff creating self‑service products.
Decision analysts developing analytical products for strategic guidance.
Backend marketing teams building marketing‑focused products.
Merchants using the “Business Advisor” product, now serving tens of millions of merchants.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
