Artificial Intelligence 16 min read

MaRCA: Multi‑Agent Reinforcement Learning Computation Allocation for Full‑Chain Advertising Systems

The article presents MaRCA, a multi‑agent reinforcement learning framework that models user value, compute consumption, and action reward to allocate limited computation resources across the entire advertising recommendation pipeline, achieving higher ad revenue while keeping system load stable under fluctuating traffic and diverse request values.

JD Tech
JD Tech
JD Tech
MaRCA: Multi‑Agent Reinforcement Learning Computation Allocation for Full‑Chain Advertising Systems

With the rapid growth of JD’s external advertising business, billions of user requests must be processed within sub‑second latency, creating a severe computation‑resource challenge when traffic fluctuates and request values differ widely.

To address this, the authors formulate the full‑chain compute‑allocation problem as a multi‑agent reinforcement learning (MARL) task: given the system state st , choose an action combination at that maximizes the reward R(st,at)=Q(st,at)−λC(st,at) while respecting per‑module load constraints Cm . The state space includes user features, traffic characteristics, and IDC information; actions are categorized as link‑selection, switch, and queue decisions.

MaRCA consists of four tightly coupled modules: (1) a user‑value estimator that predicts ad‑revenue potential per request, (2) a compute‑estimator that predicts CPU consumption for each action, (3) an action‑value estimator that predicts advertising consumption, and (4) a load‑aware decision module that dynamically adjusts the trade‑off factor λ based on real‑time CPU load and elastic‑degradation signals.

The optimization is expressed as a constrained linear program and solved via Lagrangian duality, yielding closed‑form expressions for the optimal action values and the optimal λ‑adjustment policy.

Extensive offline and online experiments, including the 2024 618 and 11.11 shopping festivals, demonstrate that MaRCA improves ad consumption by 14.93% without increasing compute resources, while significantly reducing system risk and improving reliability.

Future work plans to incorporate model‑predictive control for proactive λ prediction, expand the action space with model‑selection and filtering decisions, and generalize the approach to other recommendation pipelines facing tight compute budgets.

deep learningresource optimizationMulti-Agent Reinforcement LearningAdvertising Systemscomputation allocationload-aware scheduling
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.