Artificial Intelligence 19 min read

EdgeRec: Edge Computing in Recommendation Systems

EdgeRec explores how moving recommendation system components to the edge—leveraging real‑time user behavior, heterogeneous action modeling, on‑device reranking, mixed‑ranking, and personalized “thousand‑person‑one‑model” training—can reduce latency, improve relevance, and boost business metrics compared to traditional cloud‑centric pipelines.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
EdgeRec: Edge Computing in Recommendation Systems

Introduction Recommendation systems are critical to modern internet products, but traditional cloud‑centric pipelines suffer from bandwidth constraints, latency, and high operational costs. EdgeRec proposes shifting key recommendation components to the edge (e.g., mobile devices) to address these challenges.

Edge Computing Background Edge computing offers stable, high‑bandwidth, low‑latency, and privacy‑preserving computation by leveraging the increasing compute and storage capabilities of user devices. It reduces cloud storage pressure, mitigates central‑point failures, and improves user experience.

Rethinking Information‑Flow Recommendation A typical industrial recommendation flow involves client‑side paging requests, cloud‑side recall, coarse‑ranking, fine‑ranking, and final reranking or mixing. This architecture suffers from delayed feedback, coarse user behavior modeling, and sub‑optimal ranking due to pagination.

New Architecture: EdgeRec By moving the final decision stage to the edge, EdgeRec introduces real‑time user perception, heterogeneous behavior sequence modeling, and a Context‑aware Reranking with Behavior Attention Networks (CRBAN) that incorporates both item features and user action features. The model treats user behavior as <item, action> pairs, encodes item and action separately, and fuses them for attention‑based scoring.

System Design for On‑Device Reranking EdgeRec splits large embedding tables between cloud and device, fetching only necessary item embeddings at inference time. The on‑device inference engine (MNN) enables deployment of large neural networks on mobile phones while keeping model size manageable.

Business Impact Deploying on‑device reranking yields significant CTR improvements, especially mitigating the decay caused by pagination. Experimental results published at CIKM‑2020 demonstrate measurable gains in both overall metrics and per‑slot CTR.

Generative Ranking (Reranking 2.0) To overcome the limitations of greedy scoring, EdgeRec introduces a Generator‑Evaluator framework. The Generator (Pointer‑Net) proposes a set of K items from N candidates, while the Evaluator scores the generated sequence. Training uses policy‑gradient reinforcement learning, achieving notable performance boosts reported at SIGKDD‑2019.

Sequence Retrieval System The reranking pipeline is reframed as a sequence retrieval problem, consisting of a “recall‑stage” (candidate generation via beam search or Pointer‑Net) and a “fine‑ranking‑stage” (supervised learning with a simulator that predicts exposure‑level metrics). This architecture mirrors the classic recall‑plus‑ranking paradigm but operates entirely on the device.

On‑Device Mixing System EdgeRec also tackles mixed‑ranking, where heterogeneous content (videos, ads, articles) must be interleaved. The Edge Surrounding‑Aware Network (ESAN) predicts click‑through rates for candidate items given surrounding context and real‑time user behavior. A constrained dynamic‑knapsack planner then selects positions respecting PV‑share and dispersion rules, yielding a 7% CTR lift in short‑video feeds.

On‑Device Training and Thousand‑Person‑One‑Model To personalize models per user while preserving privacy, EdgeRec trains a shared base model in the cloud and fine‑tunes a private model on the device using local interaction data. Meta‑learning (MAML) and a personalized learning‑rate variant (PAML) address the few‑shot nature of per‑user data and long‑tail user distribution, improving GAUC for low‑activity users.

Conclusion EdgeRec demonstrates that edge‑centric recommendation pipelines—covering real‑time perception, on‑device reranking, generative ranking, mixed‑ranking, and personalized on‑device training—can substantially reduce latency, enhance relevance, and drive business growth compared to traditional cloud‑only solutions.

personalizationEdge Computingmobile AIRecommendation systemsreal-time rankingmeta-learning
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.