Building a High‑Performance, High‑Availability Advertising System at Alibaba: Architecture, Challenges, and Practices
This talk presents Alibaba's intelligent marketing platform, detailing its three ad systems, the massive data and low‑latency requirements, and the architectural solutions—including cold‑hot data separation, OLDB storage, full‑link tracing, elastic compute, and platform‑level componentization—that achieve high performance, high availability, and cost‑effective operation.
The presentation introduces Alibaba's Intelligent Marketing Platform, which powers over 100 million daily active users and more than 100 000 active advertisers, generating billions in revenue across media such as search, information flow, and app distribution.
The platform consists of three major ad systems: Wolong (search ads), Huichuan (feed ads), and Application Distribution, each handling distinct media scenarios and supporting various pricing models like CPT, CPC, CPM, oCPC, oCPM, and CPA.
A key challenge is handling massive data scale—billions of ad materials, hundreds of millions of user interactions, and extensive creative assets—while delivering sub‑200 ms response times and ensuring 24/7 availability.
To address data scale, the team implemented a cold‑hot separation strategy: high‑value materials are stored in a hot cluster with many replicas and longer retention, while low‑value materials reside in a cold cluster with fewer replicas, reducing storage cost and improving retrieval efficiency.
The storage layer, called OLDB, replaces mixed middleware (TAIR‑MDB, Redis, TAIR‑LDB) with a unified SSD‑based solution that offers low cost, millisecond‑level read/write latency, and support for complex data structures (KV, list, counter) with high throughput (≈172 billion requests/day, 44 MB/s write, 120 MB/s read).
Performance is further enhanced through component optimization (RPC, indexing, jemalloc, protobuf arena), multi‑threaded and asynchronous processing, caching, and logical sharding for parallel retrieval, achieving high compute power and low latency.
Reliability is ensured by a full‑link TRACE system that provides fine‑grained latency monitoring at the RPC and function level, enabling rapid identification of hotspots and guiding performance tuning.
An elastic compute framework dynamically allocates resources based on traffic value, using offline‑trained query value models, fraud detection, and configurable elasticity tiers to prioritize high‑value traffic while controlling resource consumption.
Iteration efficiency is achieved through componentization, service‑ification, and platform‑ization: abstracting common business logic, consolidating code bases, and providing unified platforms for feature engineering, model training, vector retrieval, and service governance.
The platform’s model engineering system centralizes feature extraction, offline processing, training, evaluation, and one‑click online deployment, while a service governance platform built on Docker and Kubernetes offers automated scaling, fault tolerance, and cost optimization.
Overall, these architectural decisions have unified storage, saved over 1 000 machines in cost, handled 1.72 trillion daily requests, and maintained high availability, demonstrating a scalable, efficient, and cost‑effective advertising infrastructure.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.