5 real ways to make money online in 2026

22 min read

Real‑Time Computing System for Alibaba Search: Architecture, Online Learning, and Strategy Optimization

The article presents Alibaba's real‑time computing platform for search, detailing its micro‑ and macro‑level architectures, online learning frameworks, point‑wise and pair‑wise ranking models, bandit‑based strategy optimization, and PID‑controlled traffic regulation, and reports significant performance gains during the Double‑11 shopping festival.

Architect

Jan 16, 2016

0. Introduction

The concept of a dual‑link real‑time computing system is introduced, distinguishing a micro‑level link that processes fine‑grained user, shop, and item data and updates underlying models in real time, and a macro‑level link that focuses on real‑time strategy optimization (bandit learning) and traffic balancing (PID control).

1. Search Real‑Time Computing System

1.1 System Architecture

An overview diagram (image) shows the overall platform built on Alibaba's iStream engine running on Hadoop YARN and HBase.

1.2 Key Components

Pora

Pora is a real‑time computation and online‑learning system for Taobao search, built on iStream + HBase, supporting second‑level analysis of massive user‑behavior and item data, distributed parameter‑server based online learning, and applications such as personalized search, anti‑fraud, and traffic optimization. During Double‑11 it processed 130 billion messages with a peak QPS of 5 million.

iGraph

iGraph provides a real‑time online graph storage and query service, handling large‑scale KV/KKV data with peak QPS of 2.45 million for queries and up to 2.8 million QPS for real‑time updates.

SP (Search Planner)

SP is a unified front‑end service interface that generates query plans across back‑end systems (QP, iGraph, ISearch5) and returns results to the front end, reducing front‑end complexity and improving performance.

ISearch5 Engine

ISearch5 is the latest search engine platform supporting second‑level real‑time indexing and data updates for multiple business lines.

Real‑Time Reporting System

Based on the Galaxy platform, this system aggregates multi‑dimensional business metrics (exposure, click, purchase, etc.) at minute granularity, providing immediate feedback for algorithm, product, and operation teams during promotions.

BtsServer

BtsServer manages bucket‑level A/B tests and, leveraging real‑time reporting data, implements real‑time strategy optimization and traffic control modules.

2. Online Learning

Describes the need for online learning to continuously adapt models to rapidly changing data distributions, especially during high‑traffic events like Double‑11, and explains why second‑level model updates are essential.

2.1 Online Learning Framework

Sample Worker: generates training samples from logs and fetches the latest model for gradient computation.

FeatureHQ: aggregates gradients of the same feature to a single Feature Worker.

Feature Worker: receives gradients and updates the model.

HBase: stores model parameters (e.g., w for LR, z/n for FTRL, user/item vectors for matrix factorization).

The framework is asynchronous, parallel, and platform‑based, allowing developers to plug in custom implementations for sample generation, gradient calculation, and weight updating.

2.2 Point‑wise Models

2.2.1 LR/FTRL

Logistic Regression and its FTRL variant are used as the first online learning algorithms. Experiments show that asynchronous training converges with accuracy comparable to synchronous offline training, and that model stability can be improved by averaging weights over recent iterations.

2.2.2 Online AUC Optimization

Implements a one‑pass AUC maximization algorithm (based on Wei Gao et al., AIJ 2014) to directly optimize ranking quality.

2.3 Pair‑wise Models

Motivated by the need to optimize relative item ordering, two pair‑wise algorithms are described: a real‑time matrix factorization model and a real‑time Bayesian personalized bilinear model. Both models incorporate static and dynamic item/user features, hinge loss, and regularization terms, and are trained on billions of samples daily.

2.3.1 Real‑Time Matrix Factorization

Defines user and item latent vectors, bias terms, and a loss function based on triplet preferences extracted from search logs, with additional Laplacian regularization to preserve purchase‑based similarity.

2.3.2 Real‑Time Bayesian Bilinear Model

Extends matrix factorization by modeling static and temporal item features, user‑specific bias, and implicit feedback, and derives a MAP estimation objective.

2.3.3 Experiments

Online metrics (NDCG, MRR, CTR) improve by over 20 % after model convergence, demonstrating the effectiveness of the real‑time pair‑wise approaches.

3. Macro Real‑Time (Strategy Optimization & Traffic Balancing)

3.1 System Architecture

3.2 Real‑Time Strategy Optimization

Traditional LTR faces challenges with real‑time feature distribution shifts and offline‑online objective gaps. The solution combines Multi‑Armed Bandit (MAB) to select the best discrete strategy and Zero‑Order Optimization to fine‑tune continuous parameters, using bandit feedback from online buckets.

3.2.1 Algorithm Flow

First, MAB selects a promising strategy from a finite set; after convergence, an extra‑gradient method refines the strategy in continuous space.

3.2.2 Multi‑Armed Bandit

Defines N strategies with unknown reward distributions, updates selection probabilities πᵢ,ₜ based on observed gains g(i,Yₜ), and periodically promotes the highest‑probability strategy to the acceptance bucket.

3.3 Real‑Time Traffic Control

3.3.1 Keyword Red Packet Traffic Control

Uses a PID controller to regulate the issuance speed of keyword red‑packet rewards, balancing contract fulfillment and smooth delivery.

3.3.2 Search Traffic Balancing

Applies PID‑based control to achieve long‑term platform goals such as balancing traffic among marketplaces and seller tiers.

4. Double‑11 Practical Results

Real‑time computing contributed to a total transaction increase of over 20 billion RMB, with PC/Hand‑held search up 11 % (pre‑heat) and 8 % (day‑of), Tmall search up 7 %, and in‑store search up 3.4 %.

5. Conclusion

The system proved critical for Double‑11 success and will continue to evolve, promising further innovations in real‑time computation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

search ranking online learning Real-Time Computing PID control bandit learning

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.