Artificial Intelligence 24 min read

Model-Agnostic Self-Sampling and Bias‑Proxy Decoupling Frameworks for Debiasing Recommendation Systems

This article presents two model‑independent debiasing solutions for recommendation systems—a self‑sampling, self‑training, self‑evaluation framework (SSTE) that balances prediction accuracy and unbiasedness, and a bias‑proxy representation decoupling framework that leverages expert‑selected proxy features to remove harmful bias while preserving useful signals, with extensive offline and online evaluations in financial product recommendation scenarios.

DataFunSummit
DataFunSummit
DataFunSummit
Model-Agnostic Self-Sampling and Bias‑Proxy Decoupling Frameworks for Debiasing Recommendation Systems

Background Recommendation systems suffer from closed‑loop feedback bias, which degrades model generalization and robustness. The article first introduces causal inference concepts (association, intervention, counterfactual) and explains how various bias types arise in recommendation pipelines.

Debiasing Overview Existing methods are categorized into heuristic, IPS‑based, unbiased data‑augmentation, and theoretical‑tool approaches, with examples such as PAL, EXMF, IPS‑MF, Doubly Robust, Auto‑Debias, KDCRec, DIB, and MACR.

1. Self‑Sampling Debias Framework (SSTE)

Problem analysis shows that bias can be beneficial; therefore SSTE separates harmful and beneficial bias. It consists of three modules:

Self‑sampling module : Generates subsets with different bias levels by truncating IPS weights. Original training set D_tr and validation set D_val are sampled into subsets A_tr and A_val using thresholds (e.g., 0.6, 0.8) to adjust sampling probabilities.

Self‑training module : Shares parameters θ_s across all data while keeping separate parameters for original and sampled subsets, preventing over‑correction.

Self‑evaluation module : Evaluates model performance on all subsets, computes the maximum metric gap α , and penalizes instability to obtain a comprehensive score.

Experiments on Yahoo!R3 and Tencent Wealth Management fund recommendation data show that SSTE improves AUC, nDCG, Precision, and Recall across MF and NCF backbones, and yields significant online gains in click‑through, conversion, and revenue.

2. Bias‑Proxy Representation Decoupling Framework

Motivation: Directly separating biased and unbiased representations is hard in industrial settings with rich features. The proposed framework selects a set of bias‑proxy features P (e.g., exposure position, packaging) and decouples their representation C from other features Z using three strategies:

Regularization : Adds a cosine similarity loss to push C and Z toward orthogonality.

Feature projection : Projects Z onto C and subtracts the component, yielding a purified Z_{pure} that is orthogonal to the bias proxy.

Mutual information constraint : Minimizes an upper‑bound estimate of I(C;Z) (using CLUB or CLUB‑SAMPLE) to force the two representations to be statistically independent.

Applying these methods to MF, MLP, and DCN models on the same datasets demonstrates superior offline metrics and consistent online improvements, with the mutual‑information approach achieving the best results.

Conclusion and Outlook The two frameworks provide practical, model‑agnostic ways to balance accuracy and unbiasedness in large‑scale recommendation systems, especially in financial domains where over‑correction can be harmful. Future work will explore deeper integration with multi‑task learning, broader bias sources, and more comprehensive evaluation protocols.

Machine LearningRecommendation systemscausal inferencedebiasingBias ProxySelf-Sampling
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.