Boost Fuzzy Search in Elasticsearch: ngram vs Wildcard Field Explained
This article compares Elasticsearch's ngram analyzer and the newer wildcard field for fuzzy searching, detailing configuration steps, performance trade‑offs, storage impact, and practical test results to help engineers choose the optimal approach for their use case.
Background
In production, Elasticsearch often needs to support fuzzy queries in addition to exact matches.
Solution 1 – ngram Analyzer
The ngram tokenizer splits indexed text into fine‑grained tokens, enabling fast recall by matching on token prefixes and suffixes. It trades space for speed, requiring larger index size and a solid understanding of tokenizers.
PUT test-005
{
"settings": {
"index.max_ngram_diff": 10,
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": ["letter", "digit"]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {"keyword": {"type": "keyword"}}
}
}
}
}
POST test-005/_bulk
{ "index": {"_id":1}}
{ "title":"英文官网承认刘强东一度被捕的原因是涉嫌性侵"}
{ "index": {"_id":2}}
{ "title":"别提了朋友哥哥刘强东窗事发了"}
{ "index": {"_id":3}}
{ "title":"刘强东施效颦,没想到竟然收获了流量"}
{ "index": {"_id":4}}
{ "title":"刘强东是谁?我不认识"}
POST test-005/_search
{
"query": {"match_phrase": {"title": "刘强东"}}
}Advantages: fast recall, low runtime cost.
Disadvantages: significant storage overhead, higher granularity increases space usage, and a learning curve for tokenizer configuration.
Empirical data shows the ngram‑based index can be up to ten times larger than a keyword index.
Solution 2 – Wildcard Query
The wildcard query provides SQL‑like LIKE functionality. Internally, Lucene builds a deterministic finite automaton (DFA) from the pattern, which can be costly for complex patterns.
Advantages: simple to use, no extra storage required.
Disadvantages: high runtime cost; misuse can cause production incidents.
Elasticsearch 7.9 introduced a dedicated wildcard field type to address fuzzy matching efficiently.
Wildcard Field Usage
Define a wildcard field in the mapping, index a document, and query with wildcards. The field also supports a case_insensitive option.
PUT my-index-000001
{
"mappings": {
"properties": {
"my_wildcard": {"type": "wildcard"}
}
}
}
PUT my-index-000001/_doc/1
{ "my_wildcard": "This string can be quite lengthy" }
GET my-index-000001/_search
{
"query": {"wildcard": {"my_wildcard": "*quite*lengthy"}}
}
GET my-index-000001/_search
{
"query": {"wildcard": {"my_wildcard": {"value": "*Quite*lengthy", "case_insensitive": true}}}
}Wildcard Implementation Details
The new field stores two structures: an n‑gram index of all three‑character sequences and a binary doc‑value of the original field, combining fast candidate generation with high compression.
Performance Tests
Comparing a keyword index with a wildcard index on several queries shows substantial speed gains for the wildcard type, especially when the query term has low discriminative power.
Query "红豆": keyword 715 ms vs wildcard 71 ms
Query "006-612014": keyword 633 ms vs wildcard 22 ms
Query "55": keyword 584 ms vs wildcard 188 ms
Query "11": keyword 1359 ms vs wildcard 357 ms
Overall, wildcard fields can reduce query latency to roughly one‑third in low‑selectivity scenarios and to one‑fifteenth in high‑selectivity cases.
Conclusion
Wildcard fields satisfy most fuzzy‑search requirements with better performance than ngram analyzers, while consuming less storage. However, their efficiency still depends on data selectivity, and developers should benchmark both approaches for their specific workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
