Backend Development 17 min read

Dynamic Compute Allocation and RT Control in Alibaba's Display Advertising Engine

Alibaba’s display advertising engine ‘Transformers’ introduces a four‑generation compute‑efficiency framework that allocates compute dynamically based on request value and controls real‑time latency through a three‑step, feedback‑driven process, delivering deeper retrieval, lower P99 latency, higher CTR and RPM while maintaining stable recall‑chain performance.

Alimama Tech
Alimama Tech
Alimama Tech
Dynamic Compute Allocation and RT Control in Alibaba's Display Advertising Engine

In the context of green computing, efficient and intelligent compute allocation is becoming increasingly important for large‑scale online advertising systems. This article introduces the latest exploration of compute allocation optimization in Alibaba Mama’s display advertising engine, named “Transformers”, which aims to make the engine as flexible and powerful as a Transformer.

The engine has evolved through four generations of compute‑efficiency systems:

Compute Efficiency 1.0 (2018): single‑point engineering optimizations and model slimming.

Compute Efficiency 2.0 (2019): combined model design and engineering optimization via algorithm‑engineering co‑design.

Compute Efficiency 3.0 (2020): system‑wide “personalized” compute, improving flexibility and cost‑performance.

Compute Efficiency 4.0 (2021): global‑view fine‑grained measurement of compute cost‑performance across the whole pipeline.

Transformers integrates this perspective, treating compute allocation as a design dimension. Instead of a uniform compute budget per request, it differentiates compute based on the incremental business value of each request, enabling “flexible” compute scaling under resource constraints.

The core practice is divided into three steps:

Step 1 – Precise RT control at a single functional point: users are grouped by estimated RT, and each group’s compute tier is adjusted to keep average RT within a target.

Step 2 – RT allocation across multiple functional points within a module: a central controller distributes per‑point RT budgets, and agents enforce them using the mechanism from Step 1.

Step 3 – Full‑link adaptive RT control: each business scenario defines a target RT; the system pre‑adjusts all modules’ RT budgets and then fine‑tunes them via real‑time feedback.

Key mechanisms include layer‑by‑layer real‑time feedback, global pre‑adjustment based on historical data, and coordination between internal function RT control and external service RT control.

Operational results show significant improvements: +10% average retrieval depth, –6 ms P99 RT, +0.18% CTR, and +0.70% RPM for the dynamic compute mode compared with static control. The upgraded monitoring view visualizes tier rankings, RT control tiers, and per‑scenario/module tier details.

In daily operation, the system maintains stable recall‑chain RT (~130 ms) and quickly detects anomalies (e.g., data updates, capacity issues). During the 11.11 promotion, dynamic compute automatically adjusted tiers, reducing degradation and improving CPU utilization and business metrics.

Future work will focus on optimizing global RT allocation strategies, joint modeling of compute, capacity, and business effect, and further research on the relationship between compute, energy consumption, and carbon management.

resource allocationbackend optimizationadvertising enginedynamic computeRT control
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.