How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch

This article walks through building a practical Elasticsearch demo that lets users type a single query like “nearby clean coffee shop” and get results by combining dense‑vector semantic search, geo filtering, BM25, and a hybrid RRF‑style ranking, with both LLM‑based structuring and a fallback hash‑based embedding.

Mingyi World Elasticsearch
Mingyi World Elasticsearch
Mingyi World Elasticsearch
How to Achieve One‑Line Semantic Search for Nearby Clean Coffee Shops with Elasticsearch

Problem Background

Users often ask simple natural‑language queries such as “nearby clean coffee shop” or “newly opened bookstore”. These queries have geographic, temporal, and semantic dimensions that are hard to capture with pure BM25 or pure vector search.

Two Solution Paths

Solution 1 – Structured Rewrite : Use a large language model or rule‑based parser to convert the sentence into structured parameters (category, tags, geo, new) and run a standard Elasticsearch query.

Solution 2 – Vector Semantic Search : Embed the description text into a dense vector, store it in a dense_vector field, and perform k‑NN search, optionally combined with geo, range, or term filters.

Index Design

The index places_demo includes the following mappings:

{
  "mappings": {
    "properties": {
      "name": {"type": "text"},
      "category": {"type": "keyword"},
      "description": {"type": "text"},
      "tags": {"type": "keyword"},
      "open_date": {"type": "date"},
      "location": {"type": "geo_point"},
      "description_vector": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}

Key fields:

category / tags : either manually labeled or filled by Solution 1.

description_vector : stores the embedding for k‑NN.

location / open_date : support geographic distance and “newly opened” range filters.

Demo Data

Four sample documents are loaded via index_ops.load_demo_docs:

熊猫精品咖啡 – quiet, hand‑brew, work‑friendly.

字里行间书店 – newly opened independent bookstore.

星际烘焙咖啡 – pet‑friendly, terrace, party.

拐角自习咖啡 – clean, spacious seats, many power outlets.

Vector Embedding and Hash Fallback

The Embedder class first tries to load a sentence‑transformers model (e.g., multilingual MiniLM). If unavailable, it falls back to a deterministic hash‑based vector: each token is SHA‑256 hashed, normalized, and averaged. pip install sentence-transformers In the .env file, set the model name, e.g.,

EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

.

Query DSL

Pure Semantic k‑NN

qv = embedder.embed(params.query_text).vector
body = {
  "knn": {
    "field": "description_vector",
    "query_vector": qv,
    "k": max(params.size, 5),
    "num_candidates": max(params.size * 20, 50)
  },
  "_source": ["name","category","description","tags","open_date","location"],
  "size": params.size
}

Parameters k (nearest neighbors) and num_candidates balance accuracy and performance.

Semantic + Geo Filter

filter = {
  "bool": {
    "must": [
      {"term": {"category": "咖啡厅"}},
      {"term": {"tags": "干净"}},
      {"geo_distance": {
        "distance": "1km",
        "location": {"lat": 39.9042, "lon": 116.4074}
      }}
    ]
  }
}

"knn": {
  "field": "description_vector",
  "query_vector": qv,
  "k": 5,
  "num_candidates": 50,
  "filter": filter
}

Semantic + Time Filter (New Bookstore)

{
  "filter": {
    "bool": {
      "must": [
        {"term": {"category": "书店"}},
        {"range": {"open_date": {"gte": "now-180d/d"}}}
      ]
    }
  }
}

BM25 Keyword Search

query = {
  "multi_match": {
    "query": params.query_text,
    "fields": ["name^2", "description", "tags^1.5"]
  }
}

Hybrid RRF‑Like Fusion (Client‑Side)

Run both k‑NN and BM25, then combine scores using a Reciprocal Rank Fusion formula: score(doc) = Σ 1 / (k + rank_i) This avoids the paid RRF feature in Elasticsearch 9.x while providing comparable results for small‑scale demos.

Front‑End Page

A simple Flask template templates/index.html presents a search box, an optional “Advanced Search” panel (category, mode, lat/lon, radius, within_days, tags, size), and a result list showing name, description, category, open date, tags, ES _score, and fused score.

The “Auto‑Fill” JavaScript watches the query input and heuristically populates fields (e.g., detects “咖啡厅” → category, “附近” → radius_km = 1, “新开” → within_days calculation, “干净” → tags).

Demo Setup

Project structure (key files): app.py (Flask entry), config.py (ES URL, index name), es_client.py, embedding.py, index_ops.py, search_ops.py, and the HTML template.

cd d:\TraePrj\yuyiDemo
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
# edit .env with ES_URL, username, password
python app.py

Open http://127.0.0.1:5000/, initialize the index, load sample data, and try queries such as “附近干净的咖啡厅”, “新开的书店”, or “安静适合办公的咖啡店,最好有插座”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonvector searchBM25Flasksemantic searchkNNHybrid Search
Mingyi World Elasticsearch
Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.