Artificial Intelligence 16 min read

Search‑Based Interest Model (SIM): Long‑Term User Behavior Modeling for CTR Prediction

This article presents the Search‑Based Interest Model (SIM), a two‑stage retrieval framework that indexes a user's entire behavior history to enable long‑term interest modeling for click‑through‑rate prediction, demonstrating practical deployment and improved recommendation of long‑term interests in e‑commerce.

DataFunSummit
DataFunSummit
DataFunSummit
Search‑Based Interest Model (SIM): Long‑Term User Behavior Modeling for CTR Prediction

Background – Traditional recommendation algorithms such as DIN, DIEN, and GRU4REC focus on short behavior sequences, limiting their ability to capture a user's full lifecycle interests, especially in e‑commerce where only end‑stage actions (click, purchase) are observable.

The authors propose building a searchable index of a user's lifelong behavior so that, at inference time, relevant historical actions can be efficiently retrieved for a given target item.

Search‑Based Interest Model (SIM)

SIM separates interest modeling from CTR estimation by introducing a two‑stage search process:

General search – performed offline to narrow millions of raw actions down to a few hundred candidates. Two implementations are described: Parameter‑based: vectorize user actions and items, then use Maximum Inner Product Search (MIPS) to retrieve top‑K relevant actions. Non‑parameterized: leverage item categories to build a hierarchical index (user → category → action list) that can be updated incrementally.

Exact search – a short‑sequence model (e.g., DIN/DIEN) that further refines the candidate set using the target item as a query, optionally incorporating time information.

The non‑parameterized approach requires only a key‑value store and keeps raw IDs, preserving maximum information while keeping online latency low.

Implementation Details

During training, the index is static (non‑parameterized) so the model can be trained end‑to‑end with the latest retrieved candidates. The authors also discuss the trade‑offs of parameterizing the general search (higher accuracy but added system complexity).

Figure above shows the UIC‑MIMN architecture that inspired the decoupling of interest modeling from CTR estimation.

Results

SIM has been deployed in Alibaba’s online advertising system, handling behavior sequences up to 50,000 actions (with practical limits around 5,000). Online A/B tests show a noticeable increase in long‑term interest clicks compared to DIEN, confirming that the two‑stage search effectively surfaces older, relevant behaviors.

The chart illustrates that SIM’s recommended items have a higher proportion of long‑term interest clicks than DIEN.

Future Work

The authors aim to personalize the index construction (e.g., per‑user indexing strategies) and explore meta‑learning approaches to allocate separate model parameters per user, moving toward truly individualized recommendation systems.

References

Zhou et al., KDD 2018, Deep Interest Network for CTR prediction.

Zhou et al., AAAI 2019, Deep Interest Evolution Network.

Hidasi et al., 2015, Session‑based Recommendations with RNNs.

Pi et al., KDD 2019, Practice on Long Sequential User Behavior Modeling for CTR Prediction.

Shrivastava et al., NeurIPS 2014, Asymmetric LSH for Maximum Inner Product Search.

AICTR predictionuser behavior modelingrecommendation systemsLong-Term Interestsearch-based indexing
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.