How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch
This article walks through building a practical Elasticsearch demo that lets users type a single query like “nearby clean coffee shop” and get results by combining dense‑vector semantic search, geo filtering, BM25, and a hybrid RRF‑style ranking, with both LLM‑based structuring and a fallback hash‑based embedding.
Problem Background
Users often ask simple natural‑language queries such as “nearby clean coffee shop” or “newly opened bookstore”. These queries have geographic, temporal, and semantic dimensions that are hard to capture with pure BM25 or pure vector search.
Two Solution Paths
Solution 1 – Structured Rewrite : Use a large language model or rule‑based parser to convert the sentence into structured parameters (category, tags, geo, new) and run a standard Elasticsearch query.
Solution 2 – Vector Semantic Search : Embed the description text into a dense vector, store it in a dense_vector field, and perform k‑NN search, optionally combined with geo, range, or term filters.
Index Design
The index places_demo includes the following mappings:
{
"mappings": {
"properties": {
"name": {"type": "text"},
"category": {"type": "keyword"},
"description": {"type": "text"},
"tags": {"type": "keyword"},
"open_date": {"type": "date"},
"location": {"type": "geo_point"},
"description_vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine"
}
}
}
}Key fields:
category / tags : either manually labeled or filled by Solution 1.
description_vector : stores the embedding for k‑NN.
location / open_date : support geographic distance and “newly opened” range filters.
Demo Data
Four sample documents are loaded via index_ops.load_demo_docs:
熊猫精品咖啡 – quiet, hand‑brew, work‑friendly.
字里行间书店 – newly opened independent bookstore.
星际烘焙咖啡 – pet‑friendly, terrace, party.
拐角自习咖啡 – clean, spacious seats, many power outlets.
Vector Embedding and Hash Fallback
The Embedder class first tries to load a sentence‑transformers model (e.g., multilingual MiniLM). If unavailable, it falls back to a deterministic hash‑based vector: each token is SHA‑256 hashed, normalized, and averaged. pip install sentence-transformers In the .env file, set the model name, e.g.,
EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.
Query DSL
Pure Semantic k‑NN
qv = embedder.embed(params.query_text).vector
body = {
"knn": {
"field": "description_vector",
"query_vector": qv,
"k": max(params.size, 5),
"num_candidates": max(params.size * 20, 50)
},
"_source": ["name","category","description","tags","open_date","location"],
"size": params.size
}Parameters k (nearest neighbors) and num_candidates balance accuracy and performance.
Semantic + Geo Filter
filter = {
"bool": {
"must": [
{"term": {"category": "咖啡厅"}},
{"term": {"tags": "干净"}},
{"geo_distance": {
"distance": "1km",
"location": {"lat": 39.9042, "lon": 116.4074}
}}
]
}
}
"knn": {
"field": "description_vector",
"query_vector": qv,
"k": 5,
"num_candidates": 50,
"filter": filter
}Semantic + Time Filter (New Bookstore)
{
"filter": {
"bool": {
"must": [
{"term": {"category": "书店"}},
{"range": {"open_date": {"gte": "now-180d/d"}}}
]
}
}
}BM25 Keyword Search
query = {
"multi_match": {
"query": params.query_text,
"fields": ["name^2", "description", "tags^1.5"]
}
}Hybrid RRF‑Like Fusion (Client‑Side)
Run both k‑NN and BM25, then combine scores using a Reciprocal Rank Fusion formula: score(doc) = Σ 1 / (k + rank_i) This avoids the paid RRF feature in Elasticsearch 9.x while providing comparable results for small‑scale demos.
Front‑End Page
A simple Flask template templates/index.html presents a search box, an optional “Advanced Search” panel (category, mode, lat/lon, radius, within_days, tags, size), and a result list showing name, description, category, open date, tags, ES _score, and fused score.
The “Auto‑Fill” JavaScript watches the query input and heuristically populates fields (e.g., detects “咖啡厅” → category, “附近” → radius_km = 1, “新开” → within_days calculation, “干净” → tags).
Demo Setup
Project structure (key files): app.py (Flask entry), config.py (ES URL, index name), es_client.py, embedding.py, index_ops.py, search_ops.py, and the HTML template.
cd d:\TraePrj\yuyiDemo
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
# edit .env with ES_URL, username, password
python app.pyOpen http://127.0.0.1:5000/, initialize the index, load sample data, and try queries such as “附近干净的咖啡厅”, “新开的书店”, or “安静适合办公的咖啡店,最好有插座”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
