Big Data 8 min read

Real-Time Data Engineering Practices for Alibaba 1688 Business

This article explains how Alibaba 1688 achieves real‑time recommendation, advertising, and product statistics through a robust middle‑platform foundation, streaming engines like Blink, data synchronization tools, and scalable storage, illustrating three concrete engineering cases and the end‑to‑end real‑time data service pipeline.

DataFunTalk
DataFunTalk
DataFunTalk
Real-Time Data Engineering Practices for Alibaba 1688 Business

The presentation introduces the significance of real‑time data engineering for Alibaba 1688, a massive e‑commerce platform handling tens of millions of visits and billions of transactions daily, where fast, accurate data drives search, recommendation, and advertising.

It outlines the middle‑platform architecture that enables efficient, low‑cost real‑time solutions, consisting of an online service system, a real‑time compute engine (Blink, an enhanced Flink), data synchronization tools (DataX), and scalable storage (OPDS, TDDL).

Online Service System: includes the HA3 search engine, the BE recommendation engine, the RTP online scoring service, and the igraph graph index for user behavior.

Real‑Time Compute Engine & Data Sync: Blink serves as the primary streaming engine; DataX synchronizes heterogeneous data sources via ODPS.

Data Storage: OPDS provides petabyte‑scale processing; TDDL offers transparent sharding and scaling similar to MySQL.

Engineering Practices (Case Studies):

1. Product Statistics Real‑Time: Implements differentiated solutions for high‑precision front‑end displays and lower‑latency ranking, combining full‑batch and incremental updates to keep search results up‑to‑date.

2. Real‑Time Recommendation Refresh: Shifts from offline recall to real‑time recall during major sales events, allowing operators to configure recall and scoring for rapid product turnover.

3. Ad‑Recommendation Data Sync: Uses Blink batch to refresh millions of ad items every five minutes, feeding a curated set of high‑quality products into the recommendation engine.

The end‑to‑end real‑time data service captures user actions (clicks, favorites, purchases), tags them via Blink, aggregates across user, product, and scene dimensions, and exposes unified APIs for downstream applications such as personalized ranking, CTR/CVR estimation, and traffic allocation.

Overall, the architecture demonstrates how a well‑designed middle platform and streaming infrastructure enable Alibaba 1688 to deliver low‑latency, data‑driven experiences at massive scale.

AlibabaData Engineeringbig dataFlinkStream ProcessingReal-time Data
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.