Artificial Intelligence 13 min read

Multi-Objective Ranking with Deep Interest Transformer for Tabular Product Recommendation

The Dewu app’s new multi‑objective ranking model replaces the shallow ESMM baseline with a DeepFM‑based MLP and a Deep Interest Transformer that encodes up to 120 recent user actions, adds a dedicated bias network, and fuses short‑ and long‑term interests, achieving modest CTR and CVR AUC improvements while planning future tab‑specific extensions.

DeWu Technology
DeWu Technology
DeWu Technology
Multi-Objective Ranking with Deep Interest Transformer for Tabular Product Recommendation

This document describes the design and deployment of a multi‑objective ranking model used in the classification‑TAB product stream of the Dewu app. The recommendation scenario is defined by the triple <userId, tabId, itemId> , but the current implementation focuses on binary modeling <userId, itemId> .

2. Model

2.1 Base ESMM – The baseline adopts the ESMM (Entire Space Multi‑Task Model) paradigm for joint CTR and CVR prediction. The original ESMM architecture is replaced with a DeepFM‑based MLP layer to capture feature interactions, but it remains shallow and does not fully exploit user representations.

2.2 Overall Architecture – The upgraded model keeps the ESMM learning paradigm while adding separate heads for CTR and CVR logits. User behavior sequences are introduced via a Deep Interest Transformer (DIT) to enrich sparse user embeddings. The overall flow is illustrated in the accompanying figures.

2.2.1 User Behavior Sequence Modeling – User actions (real‑time purchase, click‑to‑buy, favorite, etc.) within the last 7 days are merged into a sequence of up to 120 items (truncated or padded). A multi‑head self‑attention encoder captures intra‑sequence relations, and a target‑attention decoder uses the candidate item embedding as query to compute relevance with the encoded user sequence, producing a dynamic interest vector for each target item.

2.2.2 Bias Net – Various biases (gender, device, region) are modeled by a dedicated bias network whose logits are added to the main network outputs. This isolates bias learning from the primary feature learning, improving robustness compared to feeding bias features directly into the main model.

3. Long‑Term Behavior Modeling

3.1 Long‑Term Interest – Analysis shows that many user sequences contain a large proportion of default‑filled positions, weakening attention signals. By extending the effective sequence length (e.g., using the most recent 160 items and truncating to 120), the median effective length rises to 120, yielding offline gains of +0.3% CTR AUC and +0.1% CVR AUC.

3.2 Short‑ and Long‑Term Interest Fusion – Two fusion strategies are explored: (1) concatenating short‑term (Sv) and long‑term (Lv) interest vectors into a combined user vector Uv; (2) applying a gate network that learns a weighting coefficient a, producing Uv = a·Sv + (1‑a)·Lv. Both approaches achieve similar offline AUC improvements (~+0.1%).

4. Outlook

The current work concentrates on <userId, itemId> modeling. Future directions include incorporating tabId to capture TAB‑specific user behavior differences and modeling the item‑TAB relevance, which is analogous to query‑category relevance in search.

ctrrecommendation systemuser behavior modelingbias netCVRdeep interest transformermulti-objective learning
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.