Artificial Intelligence 21 min read

Personalized Computing‑Power Allocation for Alibaba Display Advertising: Transformers Engine and DCAF Algorithm

The article presents Alibaba's display‑advertising team’s three‑stage computing‑power efficiency evolution, introduces the DCAF personalized power‑allocation algorithm with its Lagrangian formulation, and describes the AllSpark dynamic‑control framework that together enable a flexible, resource‑aware Transformers engine achieving significant business gains during high‑traffic events.

DataFunTalk

Jan 4, 2021

Personalized Computing‑Power Allocation for Alibaba Display Advertising: Transformers Engine and DCAF Algorithm

Background and Motivation Over the past two years, deep‑learning‑driven recommendation systems have saturated, leading to a new bottleneck of limited computing power and diminishing model gains. Alibaba's display‑advertising team has therefore focused on improving computing‑power efficiency to sustain performance growth.

Three‑Stage Power‑Efficiency Evolution The team defines three stages: Power‑Efficiency 1.0 (2018) – point‑wise engineering optimizations; Power‑Efficiency 2.0 (2019) – co‑design of model design and engineering, yielding large gains; Power‑Efficiency 3.0 (2020) – system‑wide, “personalized” power allocation that makes the engine flexible and maximizes business value under a fixed power budget.

Transformers Engine The resulting system, named Transformers (a nod to “Transformers”), integrates power allocation into the engine design, allowing dynamic adjustment of computation per request based on its value‑to‑cost ratio.

Problem Formalization Power allocation is modeled at both micro (per‑request) and macro (time‑varying traffic) levels. For each request i and candidate power tier j, the algorithm knows the expected business value value_{i,j} and power cost cost_{i,j}. The goal is to maximize total value subject to a total power constraint C.

Static Optimal Allocation – DCAF Algorithm The DCAF (Dynamic Computation Allocation Framework) formulates the problem as a constrained optimization with a Lagrangian multiplier λ representing the marginal value‑to‑cost threshold. By solving the dual problem for a given λ, each request independently selects the tier with the highest (value_{i,j}‑value_{i,j'} ) / (cost_{i,j}‑cost_{i,j'}) exceeding λ. λ is tuned via binary search so that total power consumption matches the budget C.

Dynamic Allocation – AllSpark Framework To keep λ optimal as traffic fluctuates, the AllSpark system provides real‑time feedback control. It consists of four components:

Regulation chain : a central Controller collects module‑level metrics (failure rate, latency, CPU usage) and issues average tier suggestions; lightweight Agents embedded in each module pull these suggestions and enforce them.

Feedback chain : metrics are scraped by Prometheus and stored in a time‑series DB, giving the Controller ~5 s latency.

Control panel : UI for configuring targets, strategies, and tier policies per module or control unit, with versioning and gradual rollout.

Monitoring view : visualizes real‑time and historical tier values, alerts on abnormal changes, and ensures observability.

System Flexibility in Practice During the 2020 Double‑Eleven promotion, the AllSpark‑enabled Transformers engine handled a ten‑fold traffic surge with a single baseline deployment, automatically adjusting power tiers to keep latency low and RPM high, delivering millions of RMB incremental revenue. The framework also isolates local failures by throttling power to problematic modules, preserving overall system stability.

Beyond Power Allocation The same co‑design principles extend to multi‑scenario resource sharing (OneEngine), turning the engine into a “brain” that balances algorithmic performance, power usage, and system metrics across the entire online service.

Reference: Jiang et al., “DCAF: A Dynamic Computation Allocation Framework for Online Serving System”, arXiv:2006.09684.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning System Optimization resource allocation Online Advertising computing power algorithmic co-design

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.