How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale
Pora (Personal Offline Realtime Analyze) is a high‑throughput, low‑latency platform that captures user behavior in real time, enabling Alibaba’s search engine to deliver personalized results, support online learning, and run 24/7 with massive data volumes.
Pora (Personal Offline Realtime Analyze) is a system designed to capture user behavior in real time and feed it to personalization algorithms, enabling Alibaba’s search to deliver results tailored to individual preferences.
Overview
Pora sits in the offline part of the search engine pipeline and is built to handle massive data with high throughput and low latency, while also serving as a platform for various algorithm modules to run, adjust, and take effect instantly.
Figure 1 shows Pora’s position in the real‑time search computation architecture.
Pora Overall Architecture
Since its inception, Pora has evolved through several stages: initially built on a Storm cluster, migrated to a Yarn cluster with iStream as the stream processing framework in 2014, and underwent a Tec refactor in 2015.
Its functionality expanded from simple user‑gender personalization to item‑level personalization, supporting real‑time anti‑fraud, recommendation, and ranking services, with online learning frameworks and algorithms such as LR, FTRL, and real‑time matrix factorization.
Figure 2 illustrates the current overall architecture.
Key Changes in 2015
1. Tec Refactor
The core layer of Pora was extracted into the lightweight Tec framework, simplifying business logic and algorithm interfaces. Tec enables rapid development of high‑throughput, low‑latency real‑time applications, such as converting hourly batch jobs to second‑level real‑time processing for AliExpress search.
2. Online Learning
In 2014 data and models were updated in real time; in 2015 an online learning framework based on a Parameter Server architecture was built on Pora, allowing asynchronous, parallel, platform‑wide model updates. Feature Workers and HBase Model Storage constitute the Parameter Server.
3. 24/7 Operation
Since iStream 0.9, hot configuration switching eliminates restarts; the main search dump achieved 24‑hour incremental updates in 2015, removing nightly downtime and delivering near‑zero‑latency personalization and recommendation data.
4. Performance Optimizations
Pora’s QPS grew from 940,000 during 2014’s Double‑11 peak to 5,010,000 in 2015. Optimizations across offline teams and infrastructure (Yarn, HDFS, iStream, HQueue, HBase, CM8/ET2 clusters) contributed to this growth.
Business optimization: streamlined search PV log processing and storage I/O, direct htable reads, and trimmed unnecessary logs.
System optimization: added client‑side cache to htable (85% hit rate during peak), tuned HQueue and htable configurations, and mitigated hotspot machines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
