How to Optimize Elasticsearch Queries for Precise Enterprise Search Results
This article walks through the practical steps of improving Elasticsearch relevance for an enterprise search platform, covering user requirements, index creation, analysis, scoring models, boost and filter techniques, function_score customizations, and post‑query interventions to deliver more accurate and business‑aligned results.
Introduction
The article presents a technical case study of how the "爱番番企业查询" platform uses Elasticsearch (ES) to deliver precise enterprise search results and details the optimisation process applied to meet user expectations.
User Requirements
High matching degree between query keywords and company name or legal representative.
Ability to retrieve enterprises that satisfy user‑specified conditions.
After matching, rank results by company health indicators such as operating status, registered capital and risk level.
Elasticsearch Fundamentals
ES is built on the Apache Lucene™ library and consists of two main parts: index creation and index querying.
Index Creation
Documents are tokenised, analysed and stored in an inverted index. The index contains:
Term dictionary – maps each term to its posting list.
Posting list – records which documents contain each term.
Analysis Process
Analysis transforms raw text into a stream of tokens using three components:
Character filter – removes HTML tags.
Tokenizer – splits text (e.g., whitespace for English).
Token filter – lower‑cases tokens, removes stop‑words, adds synonyms, etc.
The _analyze API can be used to inspect tokenisation results.
Scoring Model
ES computes a relevance score (_score) for each hit using a practical scoring function that combines TF/IDF, vector‑space concepts, coordination factor, field‑length normalisation and optional modern features.
By default results are sorted by descending _score.
Optimisation Techniques
1. Keyword Matching
Use match_phrase with a suitable slop value to enforce ordered token matching, and combine match and match_phrase to balance recall and precision.
2. Boosting Important Fields
Apply boost to the company name and legal representative fields, and use phrase matching for the legal representative to increase relevance.
3. Adding Filters
Introduce filter clauses (bool → filter) for user‑specified criteria such as operating status, registered capital, or risk flags. Filters narrow the candidate set without affecting scoring and benefit from ES caching.
4. Custom Scoring with function_score
Combine the original query score with additional factors using function_score. The article selects script_score for maximum flexibility, allowing custom scripts to read document fields (e.g., doc['field'].value) and compute a result_score as:
Original query score (query_score).
Custom function score (func_score).
Final score = query_score × func_score (default boost_mode).
5. Active Interventions
Keyword Extraction : Extract key terms (e.g., company name) into a dedicated mapping and apply phrase matching to boost relevance.
Secondary Sorting : After the primary relevance sort, re‑rank a subset of results based on business metrics (e.g., health score) to push the most valuable enterprises to the top.
Final Result
The combined approach—precise phrase matching, field boosting, filtered queries, custom script scoring, and post‑query re‑ranking—produces search results that better satisfy the three user requirements outlined earlier.
Conclusion
Search relevance optimisation is an iterative process that requires continuous monitoring, user‑feedback collection, and handling of edge cases. The described methods provide a practical roadmap for tailoring Elasticsearch to enterprise‑search scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
