Big Data 15 min read

Alibaba’s Secrets to High‑Throughput Full‑Load and Low‑Latency Search Processing

This article details how Alibaba migrated its massive Taobao‑Tmall search workload to the search offline platform, tackling challenges of massive data volume, one‑to‑many joins, and hotspot sellers through a series of performance optimizations—including local joins, salt‑based data sharding, dynamic aggregation jobs, and asynchronous processing—to achieve high‑throughput full loads and low‑latency incremental updates.

Alibaba Cloud Developer

Jan 20, 2020

Alibaba’s Secrets to High‑Throughput Full‑Load and Low‑Latency Search Processing

Introduction

In Alibaba's search engineering system, online services handle millisecond‑level user requests, while offline systems ingest and process source data before feeding the search engine. The offline platform is a critical link that directly impacts downstream user experience.

Search Offline Platform Basics

The platform consists of a synchronization layer and a data‑processing layer, each handling full and incremental flows. It supports hundreds of business lines, with the largest data set coming from Taobao‑Tmall product reviews, reaching billions of rows.

Supported Business Scenarios

Data from MySQL or ODPS sources is transformed offline and finally loaded into HA3 or Elasticsearch for online search.

Technology Stack

Data is stored on HDFS/Pangu, resource scheduling relies on YARN or Hippo, and the unified compute framework is Flink/Blink.

Full Load (Batch) Processing

Full load must achieve high throughput to process billions of records within a limited time window. The job involves synchronizing many source tables and performing extensive joins, UDTFs, and MultiGets, ultimately producing HDFS files.

Initial attempts suffered from severe performance bottlenecks in multi‑PK dimension joins, which were synchronous and slow.

Local Join & Sort‑Merge Join

By replacing the costly dimension joins with LocalJoin (leveraging identical partitioning in HoloStore) followed by Sort‑Merge Join, shuffle overhead was reduced.

Salt‑Based Data Sharding

Hot sellers caused data skew. Top‑seller IDs were identified and salted before the join, then the salt was removed after Sort‑Merge Join, effectively balancing the workload.

Dynamic Nested Aggregation Jobs

FullDynamicNestedAggregation (a Blink batch job) pre‑aggregates one‑to‑many tables into the main dimension tables, eliminating the need for multi‑PK scans during the final join.

Incremental (Streaming) Processing

Incremental jobs read change messages from SwiftQueue, join with mirror tables, apply UDTFs, and write updates back to SwiftQueue and Holo tables. The goal is second‑level latency while guaranteeing at‑least‑once semantics.

Stream‑Batch Fusion

A combination of batch‑style aggregation and streaming processing was applied, adding deduplication windows to reduce duplicate updates.

Solving One‑to‑Many Join Challenges

One‑to‑many tables stored in multi‑PK Holo tables caused synchronous scans. The solution grouped records by the first PK, converted fields to JSON, and wrote them back to the main table, enabling true asynchronous processing.

Truncation Optimization

To avoid large JSON blobs causing Full GC, scans were limited to a configurable number of rows per PK, with whitelist support per table.

Windowed Deduplication

A 30‑minute deduplication window reduced duplicate updates by over 90%.

Conclusion

Through a series of targeted optimizations—local joins, salt sharding, dynamic aggregation jobs, and asynchronous handling of one‑to‑many data—Alibaba’s main search workload successfully migrated to the offline platform, achieving both high‑throughput full loads and low‑latency incremental updates, and demonstrated robust performance during peak traffic events such as Double 11.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alibaba Big Data Flink offline processing Search

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.