Unified Model Compression Framework (UMEC) for Efficient Recommendation Systems

The paper introduces UMEC, a unified model compression framework that jointly optimizes feature embedding and prediction modules under resource constraints, achieving up to three‑fold compression of recommendation models without sacrificing accuracy, and demonstrates superior performance on multiple benchmark datasets.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Unified Model Compression Framework (UMEC) for Efficient Recommendation Systems

Model compression, originally developed for vision models, aims to improve inference efficiency on servers or mobile devices. Recognizing the growing need for efficient recommendation systems, Kuaishou together with researchers from Texas A&M, University of Rochester, and UT Austin propose a unified model compression framework (UMEC) that compresses benchmark recommendation models by three times without loss of accuracy.

Background and Motivation: Modern recommendation systems rely on neural network models composed of a feature embedding module and a prediction module, both of which consume significant resources. Existing works treat the compression of these modules independently, missing opportunities for joint optimization.

Method Overview: UMEC treats the embedding and prediction modules as a single entity and formulates a resource‑constrained joint optimization problem. The objective minimizes the original training loss while enforcing constraints on overall computational budget (e.g., FLOPs) and structured sparsity levels for both modules.

Optimization Technique: The problem is transformed into a minimax formulation using Lagrange multipliers. An ADMM‑based algorithm solves the minimax problem, with proximal‑SGD updating model weights and a Straight‑Through Estimator handling nondifferentiable sparsity terms. Dual variables are optimized via gradient ascent.

Experimental Results: On the open‑source DLRM benchmark and Criteo AI Labs datasets, UMEC consistently outperforms state‑of‑the‑art baselines such as ECC and Group Lasso. It achieves up to 65% reduction in computation while preserving original prediction accuracy, and demonstrates superior trade‑offs across various compression scales.

Conclusion: UMEC provides a systematic solution for jointly compressing feature and prediction modules under resource constraints, with extensive experiments confirming its effectiveness. The paper includes links to the full manuscript and source code for further exploration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIDeep Learningresource-constrained optimizationUMEC
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.