When to Use Synonyms in Elasticsearch: Index-Time vs Search-Time
This article explains how Elasticsearch tokenizes text, why small spelling errors or plural forms can miss matches, and how synonyms improve search recall, comparing the trade‑offs of applying synonyms during indexing versus at query time, with practical code examples and tips for managing large synonym lists.
Search engines break documents and queries into tokens (terms). Small spelling errors or plural forms can prevent matches because simple string similarity is used.
Synonyms help by mapping different words with the same meaning, improving recall—for example, a query for “oil” can match documents containing “crude oil” or “petroleum”.
Index‑time vs Search‑time Synonym Usage
Applying synonyms at index time expands terms once and stores them, which increases index size, affects term statistics, and requires reindexing to change rules. The only advantage is a modest performance gain during query execution.
Using synonyms at search time avoids index growth, keeps term statistics unchanged, and allows rule updates without reindexing, though each query must expand terms, potentially using the synonym_graph filter for multi‑word synonyms.
When using search‑time synonyms, the analyzer must be reopened after rule changes; this issue has been resolved in recent Elasticsearch versions.
Parsing Synonyms During Query
Example of creating an index with a custom analyzer that uses a synonym_graph filter:
PUT myindex
{
"settings": {
"analysis": {
"filter": {
"my_synonyms": {
"type": "synonym_graph",
"synonyms": [
"看月亮,吃月饼=>中秋节",
"双十一,双11=>购物",
"免费,免费版,不要钱的,无偿"
]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "my_synonyms"]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_analyzer"
}
}
}
}Sample search results show the synonyms being applied.
For large synonym sets, store them in a file (e.g., synonyms.txt) and reference it with "synonyms_path". Mark the filter as "updateable": true to reload without closing the index.
PUT myindex
{
"settings": {
"analysis": {
"filter": {
"my_synonyms": {
"type": "synonym_graph",
"synonyms_path": "analysis/synonyms.txt",
"updateable": true
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "my_synonyms"]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_analyzer"
}
}
}
}Search queries demonstrate how synonyms expand terms and affect scoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
