Big Data 6 min read

How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale

Pora (Personal Offline Realtime Analyze) is a high‑throughput, low‑latency platform that captures user behavior in real time, enabling Alibaba’s search engine to deliver personalized results, support online learning, and run 24/7 with massive data volumes.

21CTO

Jan 25, 2016

How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale

Pora (Personal Offline Realtime Analyze) is a system designed to capture user behavior in real time and feed it to personalization algorithms, enabling Alibaba’s search to deliver results tailored to individual preferences.

Overview

Pora sits in the offline part of the search engine pipeline and is built to handle massive data with high throughput and low latency, while also serving as a platform for various algorithm modules to run, adjust, and take effect instantly.

Figure 1 shows Pora’s position in the real‑time search computation architecture.

Search real‑time computation architecture

Pora Overall Architecture

Since its inception, Pora has evolved through several stages: initially built on a Storm cluster, migrated to a Yarn cluster with iStream as the stream processing framework in 2014, and underwent a Tec refactor in 2015.

Its functionality expanded from simple user‑gender personalization to item‑level personalization, supporting real‑time anti‑fraud, recommendation, and ranking services, with online learning frameworks and algorithms such as LR, FTRL, and real‑time matrix factorization.

Figure 2 illustrates the current overall architecture.

Key Changes in 2015

1. Tec Refactor

The core layer of Pora was extracted into the lightweight Tec framework, simplifying business logic and algorithm interfaces. Tec enables rapid development of high‑throughput, low‑latency real‑time applications, such as converting hourly batch jobs to second‑level real‑time processing for AliExpress search.

2. Online Learning

In 2014 data and models were updated in real time; in 2015 an online learning framework based on a Parameter Server architecture was built on Pora, allowing asynchronous, parallel, platform‑wide model updates. Feature Workers and HBase Model Storage constitute the Parameter Server.

3. 24/7 Operation

Since iStream 0.9, hot configuration switching eliminates restarts; the main search dump achieved 24‑hour incremental updates in 2015, removing nightly downtime and delivering near‑zero‑latency personalization and recommendation data.

4. Performance Optimizations

Pora’s QPS grew from 940,000 during 2014’s Double‑11 peak to 5,010,000 in 2015. Optimizations across offline teams and infrastructure (Yarn, HDFS, iStream, HQueue, HBase, CM8/ET2 clusters) contributed to this growth.

Business optimization: streamlined search PV log processing and storage I/O, direct htable reads, and trimmed unnecessary logs.

System optimization: added client‑side cache to htable (85% hit rate during peak), tuned HQueue and htable configurations, and mitigated hotspot machines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alibaba Big Data personalization stream processing real-time analytics Pora

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.