How ComRecycle Cuts CPU/GPU Use by 23% in Taobao Ads: An Intelligent Computation Recycling Framework
This paper introduces ComRecycle, an intelligent computation recycling framework for Taobao's display advertising system that caches and reuses ad candidates across recall, coarse‑ranking, and fine‑ranking stages, achieving up to 23% CPU and 22% GPU savings while maintaining recommendation quality.
Abstract
As a key infrastructure supporting the core commercial value of the Taobao platform, the display advertising system processes billions of high‑concurrency ad requests daily. Repeated user requests lead to redundant execution of recall, coarse‑ranking, and fine‑ranking stages, wasting significant compute resources. We propose ComRecycle , an intelligent computation recycling framework that caches and reuses ad sets across these stages, enabling fine‑grained compute scheduling while preserving recommendation effectiveness. Modeling the compute‑recycling decision as an online constrained optimization problem and solving it with user‑interest modeling and Lagrangian dual methods, ComRecycle reduces CPU usage by 23% and GPU usage by 22% in online experiments, offering a new paradigm for sustainable large‑scale e‑commerce ad systems. The paper was accepted to KDD ’25.
Introduction
Online advertising efficiency hinges on model capability and compute resources. While industry focuses on improving model prediction accuracy, the trade‑off between incremental compute/storage costs and model gains is often overlooked. Analysis of Taobao’s feed ad system reveals that many users issue repeated requests (e.g., frequent page refreshes), resulting in redundant multi‑stage computations and low ad exposure, thus wasting compute resources. To address this, we develop a compute‑recycling framework that caches potentially reusable, unexposed ads and reuses them for repeated requests.
Data Analysis
3.1 Basic Data Analysis
We find that 74% of ad requests are repeated requests (a request is considered repeated if the same user has at least one prior request on the same day). Most repeated requests have short intervals: 46% occur within 10 seconds and 75% within 2 minutes. High‑frequency refreshes cause QPS spikes and substantial compute load, while also reducing effective ad exposure—41% of requests receive no ad exposure and nearly 70% expose fewer ads than the candidate set size.
3.2 Redundancy Analysis
Caching unexposed ads can significantly cut resource consumption, but naïve reuse may ignore user interest shifts, degrading ad delivery efficiency. We define redundancy metrics such as ad coverage rate (ACR), pCTR difference, and bid difference to quantify overlap between initial and repeated requests across recall, coarse‑ranking, and fine‑ranking stages.
ComRecycle Computation Recycling System
4.1 Overall Framework
Figure 3 shows the overall architecture. For an initial request, the system runs the full multi‑stage pipeline and caches the ad sets generated at each stage. For a repeated request, a real‑time user‑interest model decides whether and which cached stage to reuse. Four strategies are supported: no reuse, recall‑cache reuse, coarse‑ranking‑cache reuse, and fine‑ranking‑cache reuse.
4.2 Problem Modeling
The core challenge is selecting the optimal reuse strategy for each repeated request under a constraint on recommendation quality. We formulate this as an online convex optimization problem: minimize total compute cost while keeping efficiency above a threshold. Each request i can be assigned a strategy s∈{0,1,2,3} with known compute cost c_{i,s} and efficiency decay d_{i,s}. Using Lagrangian multipliers, we derive near‑optimal online decisions.
4.3 Uplift‑Based Efficiency Decay Estimation
We train an uplift model to predict marginal efficiency loss for each strategy. A multi‑task architecture combines GRU‑based user‑behavior modeling with shared bottom layers (MMoE) and outputs four logits: baseline (no reuse) and uplift logits for recall, coarse‑ranking, and fine‑ranking reuse. The treatment logits are summed with the baseline and passed through an activation to estimate efficiency, from which decay is computed.
4.4 Lagrangian Multiplier Solution
Assuming diminishing returns of efficiency with added compute, we solve the dual problem via binary search (Algorithm 1) and update the multiplier every 15 minutes to adapt to traffic fluctuations, achieving near‑optimal offline performance with minimal online overhead.
Experiments
5.1 Offline Experiments
We evaluate ComRecycle using response time (RT) and ad coverage rate (ACR) as proxies for compute cost and efficiency. Results (Table 3, Figure 4) show that the online policy closely matches the theoretical optimum and outperforms static strategies in resource utilization.
5.2 Online Experiments
Deployed in Taobao’s display ad system, ComRecycle’s A/B test demonstrates that, while maintaining comparable delivery efficiency to the baseline, it saves 23% CPU and 22% GPU inference resources.
References
[1] D. Agarwal, B. C. Chen, P. Elango, et al., “Personalized click shaping through Lagrangian duality for online recommendation,” in Proceedings of the 35th ACM SIGIR Conference , 2012, pp. 485‑494.
[2] P. Covington, J. Adams, E. Sargin, “Deep neural networks for YouTube recommendations,” in Proceedings of the 10th ACM RecSys Conference , 2016, pp. 191‑198.
[3] Z. Yuan, K. Ren, G. Wang, et al., “Hydrus: Improving Personalized Quality of Experience in Short‑form Video Services,” in Proceedings of the 46th ACM SIGIR Conference , 2023, pp. 1127‑1136.
[4] B. Jiang, P. Zhang, R. Chen, et al., “DCaF: A dynamic computation allocation framework for online serving system,” arXiv preprint arXiv:2006.09684, 2020.
[5] S. Agrawal, Z. Wang, Y. Ye, “A dynamic near‑optimal algorithm for online linear programming,” Operations Research , vol. 62, no. 4, pp. 876‑890, 2014.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
