From Zero to One: Building a Personalized E‑commerce Search with Easysearch
The article walks through constructing a fully personalized e‑commerce search system using Easysearch and Python Flask, detailing product modeling, behavior collection, profile building with time decay and LLM augmentation, and how to inject these signals into Elasticsearch DSL for real‑time, user‑specific ranking and recommendation.
Ever searched for "Bluetooth headphones" only to see half‑in‑ear models you don’t like, even though you’re a sports enthusiast who prefers over‑ear? This guide shows how to build a search system that recognizes each user so the same query or homepage yields different results for different people.
1. Why a "recognizing" search is needed
An e‑commerce platform must be fast, but speed isn’t enough; it also needs to understand user preferences. For example, three users A, B, and C all search "headphones" but have distinct budgets, usage scenarios, and feature priorities. Returning the same ranking to all would be foolish; results must be re‑ranked per user.
2. System Overview
The SearchPersona system consists of behavior collection, profile building, and query-time personalization, all built on Easysearch (Elasticsearch‑compatible) with a Python Flask backend and plain HTML/JS frontend.
3. Product Modeling
To personalize, each product must expose weighted attributes beyond a plain title. During index initialization ( scripts/init_indices.py) items are indexed with fields such as category (keyword), tags (keyword array), price_tier (bucketed price), and score_base (content quality).
{
"id": "rc_001",
"title": "真无线入耳式蓝牙耳机 主动降噪 低延迟游戏模式",
"category": "电子",
"sub_category": "音频设备",
"tags": ["蓝牙", "降噪", "低延迟"],
"price": 289.0,
"price_tier": "mid",
"score_base": 0.78,
"created_at": "2025-01-15"
}Design points: category is a keyword; a user’s preference for "电子" can boost all electronics. tags is a keyword array; liking "蓝牙" or "降噪" adds weight to matching items. price_tier is bucketed into five levels (budget, value, mid, upper, premium) to enable categorical price preferences. score_base is a popularity/quality score that can be weighted with field_value_factor.
A script ( scripts/import_real_cases.py) loads 100 realistic products covering electronics, home, apparel, food, and books, each with detailed tags to make personalization observable.
4. Behavior Collection
Each click or view triggers a POST to /api/event:
{
"event_type": "click",
"item_id": "rc_001",
"item_category": "电子",
"item_price_tier": "mid",
"item_tags": ["蓝牙", "降噪"],
"session_id": "sess_abc",
"user_id": "user_123"
}The backend writes the event into the Easysearch behavior index ( sp_behavior_events) and updates an in‑memory session context so the current session can influence the immediate search.
Because Elasticsearch refreshes only once per second, relying solely on persisted events would cause a noticeable lag for a newly clicked item. The session cache provides sub‑second real‑time personalization, with a future plan to move it to Redis.
Session context example:
{
"clicked_categories": ["电子", "电子", "家居"],
"viewed_tags": ["蓝牙", "降噪", "静音"]
}5. User Profile Construction
Long‑term interests are aggregated from the behavior index in core/profile_builder.py via the build_from_events method. The pipeline:
Take the last N days (e.g., 30) of events.
Assign base weights: purchase = 3.0, click = 1.0, view = 0.5.
Apply exponential decay: weight = base_weight * exp(-days / half_life) with half‑life = 8.5 days and a lower bound of 0.17.
Sum weights per category, tag, and price tier, then normalize (max‑min).
Resulting profile fields include category_weights, tag_weights, price_tier_pref, and a numeric price_sensitivity_score (0‑100).
{
"category_weights": {"电子": 1.0, "家居": 0.6},
"tag_weights": {"蓝牙": 1.0, "降噪": 0.8, "便携": 0.5},
"price_tier_pref": "mid",
"price_sensitivity_score": 62
}During profile rebuild, the current session’s clicked categories are also added (boost = 1.2) to reflect the freshest interests.
5.2 LLM‑augmented profiling
Pure statistics can be too coarse, so recent events are fed to DeepSeek (OpenAI‑compatible) with a prompt that asks the model to infer a textual summary and extra interest tags. The model returns JSON such as:
{
"summary": "用户偏好中端降噪耳机与运动音频设备,对性价比敏感",
"interest_tags": ["降噪耳机", "运动耳机", "百元档"],
"category_hints": ["电子"],
"traits": {"price_sensitivity": "medium"}
}These LLM‑generated tags ( llm_interest_tags) and category hints ( llm_category_hints) are stored in the profile and weighted by LLM_TAG_WEIGHT (default 1.25). Even a brand‑new user with only a few clicks receives a reasonable cold‑start profile.
6. Search Ranking – Injecting the Profile
The core ranking logic lives in core/persona_ranker.py → build_query. The process:
Start with a basic multi_match query on title^2 and tags.
Wrap it in a function_score that adds multiple weighting functions derived from the profile.
6.1 Category Weighting
For each category whose weight exceeds 0.45, a filter‑weight pair is added. The weight is computed as profile_weight × PERSONA_CATEGORY_WEIGHT (default 2.0), plus an optional session boost ( PERSONA_SESSION_BONUS = 0.5).
6.2 Price‑Tier Matching
If the user’s preferred tier is "mid", matching items receive a +1.5 multiplier, while non‑matching items are demoted with a 0.8 multiplier.
6.3 Tag Weighting
All tags with profile weight > 0.3 (including LLM tags) receive a boost (e.g., 1.5 for "蓝牙", 1.25 for "降噪"). Session‑viewed tags get an additional small bonus.
6.4 Session Fallback
When a user is anonymous or has a sparse profile, the current session’s clicked categories and viewed tags are still injected with modest weights ( PERSONA_SESSION_ONLY_CATEGORY_BOOST = 0.38, PERSONA_SESSION_ONLY_TAG_BOOST = 0.28) so recent behavior can influence results.
6.5 Content Quality Score
A field_value_factor on score_base (factor = 1.2, modifier = ln1p, missing = 0.1) boosts items with higher popularity/quality.
All functions use score_mode: "sum" and boost_mode: "multiply", so the final score equals the textual relevance multiplied by the summed personalization weight.
6.6 Final DSL (truncated example)
{
"query": {
"function_score": {
"query": {"multi_match": {"query": "耳机", "fields": ["title^2", "tags"], "type": "best_fields"}},
"functions": [
{"filter": {"term": {"category": "电子"}}, "weight": 2.4},
{"filter": {"term": {"price_tier": "mid"}}, "weight": 1.5},
{"filter": {"term": {"tags": "蓝牙"}}, "weight": 1.5},
{"filter": {"term": {"tags": "降噪"}}, "weight": 1.25},
{"filter": {"term": {"category": "家居"}}, "weight": 0.76},
{"field_value_factor": {"field": "score_base", "factor": 1.2, "modifier": "ln1p", "missing": 0.1}}
],
"score_mode": "sum",
"boost_mode": "multiply",
"min_score": 0.1
}
},
"from": 0,
"size": 20
}Thus, for the same query "耳机", user A (high "电子" and "蓝牙" weights) sees electronics with Bluetooth at the top, while user B (high "运动" and "value" tier) gets sport‑oriented headphones first.
7. Persona‑Driven Feed (No Query)
On the homepage, when there is no explicit query, the system builds a keyword string from the profile: top tags (> 0.12), top categories (> 0.2), LLM‑generated interest tags, LLM category hints, and the current session’s clicks/tags. After deduplication, the string (e.g., "蓝牙 降噪 电子产品 运动耳机 家居") is fed to a multi_match query. If nothing is available, a fallback list like "键盘 大米 咖啡 图书 家居 耳机" is used.
8. Visualization
A simple /persona page visualizes the profile with ECharts:
Rectangular treemap – tag weights.
Bar chart – category preferences (including session clicks).
Word cloud – tags viewed in the current session.
Polar chart – LLM‑inferred interest intensity.
Pie chart – distribution of category weights.
Text description – LLM‑generated user summary.
Refreshing the profile triggers POST /api/profile/rebuild, which re‑aggregates from Elasticsearch and calls the LLM.
9. Summary
The end‑to‑end SearchPersona project demonstrates a production‑grade personalized e‑commerce search pipeline:
Fine‑grained product modeling with keyword tags and bucketed price tiers.
Event collection + real‑time session context.
Profile building using weighted aggregation, exponential decay, and LLM augmentation.
Search ranking via Elasticsearch function_score that combines category, price, tag, session, and content‑quality signals.
Cold‑start recommendation by turning the profile into a multi‑term query.
Interactive visualization of the user’s persona.
The system is compact yet covers the core ideas of industrial‑scale personalized search, and can be adopted directly or used as a reference to add a similar function_score layer on top of an existing Elasticsearch deployment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
