Applying Sequence Embedding for Car Model Preference Prediction

This article explains how sequence embedding can be applied to user browsing data on an automotive website to predict intended car models, addressing data sparsity and leveraging temporal information to improve prediction accuracy by 3%.

HomeTech
HomeTech
HomeTech
Applying Sequence Embedding for Car Model Preference Prediction

Series 49, 2019 Issue 23

Every user of the automotive website browses various pages such as the user account, articles, and forum reviews; each browsing action provides implicit feedback about the user's interests, which can be used to predict the user's preferred car series.

User Browsing Records

Traditional Machine Learning Methods

The intended car series can be treated as a typical classification problem, and many machine‑learning algorithms can solve it. Users express preferences for different car models through their click and browsing behavior, so a model can be built to infer preference levels from historical actions.

Each record represents a single session where a user visits pages such as a configuration page, reputation page, price page, or video page. For example, User1 in Session1 visited Page1 and Page3 (1 = visited, 0 = not visited) and the stay time on each page is recorded in seconds.

We can directly feed the six features (page visits and stay times) into a prediction model, or we can aggregate page visit counts and total stay time and use those aggregates as features.

Methods such as Logistic Regression, GBDT, and Random Forest can be applied, but they face two problems: the large number of car‑related pages leads to data sparsity, and they ignore the sequential nature of browsing behavior. The solution is Sequence Embedding.

Sequence Embedding

First, understand Word Embedding, which is an NLP‑based feature‑learning technique that maps words to real‑valued vectors, clustering similar words together.

Thus any word can be represented as an n‑dimensional vector, and the distance between two vectors reflects word similarity (e.g., "Microsoft" and "IBM").

Analogously, we represent each web page in the browsing sequence as an n‑dimensional vector; the size of n depends on the contextual information and can be learned via pre‑training or a custom model.

When multiple pages appear in order, we apply sequence embedding to capture meaningful context. In the car‑preference scenario, a "word" is a page in the user's browsing sequence. To make the embedding more informative, we also consider attributes such as stay time and recency, normalizing the time information.

We obtain embeddings for each page in a user's browsing sequence, weight them by the normalized stay time, and compute a weighted average to form the user's overall browsing‑sequence embedding. This vector is then used to predict the intended car series.

Conclusion

Sequence Embedding addresses data sparsity and fully leverages users' browsing‑sequence information, incorporating behavioral signals into the model; the prediction accuracy for intended car series improves by 3%, and the online model remains stable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

artificial intelligencemachine learningcar recommendationsequence embeddinguser behavior prediction
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.