Overview and Architecture of Pora: A Real‑Time Personalization Analytics Platform
The article introduces Pora, a real‑time offline‑realtime analytics system for personalized search that combines high‑throughput stream processing, low‑latency computation, online learning algorithms, and a modular architecture to support continuous 24/7 operation and large‑scale performance optimizations.
Pora (Personal Offline Realtime Analyze) is a system designed to capture user behavior in real time and enable algorithms to instantly update personalized information for users and items, supporting search personalization at Alibaba.
Within the search pipeline, Pora sits in the offline part of the engine but is built to handle streaming data with both massive throughput and low latency, acting as a platform where many algorithm modules can run, be adjusted, and take effect instantly.
Pora Overall Architecture
Since its inception, Pora has evolved from a Storm‑based application to a Yarn‑based cluster using the iStream stream‑processing framework, undergoing a Tec refactor in 2015. Its functionality expanded from simple user‑gender personalization to item‑level personalization, supporting multiple scenarios such as real‑time anti‑fraud, recommendation, and ranking, with online learning frameworks (LR/FTRL, real‑time matrix factorization) added in 2015.
Key Changes in 2015
1. Tec Refactor
The core layer was extracted into the lightweight real‑time computing framework Tec, simplifying business logic and algorithm interfaces. Tec enables rapid development of high‑throughput, low‑latency applications, e.g., converting AliExpress offline database dump jobs from hourly to second‑level real‑time processing.
2. Online Learning
In 2014 data was updated in real time; in 2015 an online learning framework based on a Parameter Server (Feature Worker + HBase Model Storage) was built, following Google’s Downpour SGD, providing asynchronous, parallel, platform‑wide model updates.
3. 24/7 Operation
Since iStream 0.9, hot‑swap configuration eliminates restarts; the main search dump achieved continuous incremental updates in 2015, removing nightly downtime and delivering second‑level processing latency with zero‑delay algorithm configuration changes.
4. Performance Optimization
Pora’s QPS grew from 940,000 during 2014 Double‑11 to 5.01 million in 2015, handling peak traffic with second‑level end‑to‑end latency. Optimizations involved business‑level log processing, HTable client caching (85 % hit rate), HQueue tuning, and resource improvements on Yarn/HDFS/iStream/HBase clusters.
Business optimization: streamlined search PV log pipeline, direct HTable reads, and trimmed unnecessary logs.
System optimization: client‑side HTable cache, HQueue/HTable configuration tweaks, and hotspot mitigation for mixed workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
