How to Achieve MySQL‑LIKE Style Fuzzy Search in Elasticsearch 8.x
This article walks through the challenge of implementing MySQL‑LIKE style front‑and‑back wildcard searches in Elasticsearch, comparing match, match_phrase, n‑gram, legacy wildcard queries, and the new wildcard field type introduced in ES 7.9+, with code samples, performance benchmarks, and practical recommendations for choosing the optimal solution.
Introduction
When a product manager demanded a MySQL‑LIKE "LIKE '%keyword%'" style fuzzy search, the Elasticsearch team realized a deeper technical challenge.
Tokenization Basics
Understanding Elasticsearch’s core concept of tokenization is essential for fuzzy search.
Tokenization Example
原始文本:"苹果手机真香"
分词结果:["苹果", "手机", "真", "香"]Match Query and Its Limitation
GET /products/_search
{
"query": {
"match": {
"name": "苹果手机"
}
}
}Problem: The default match query uses the or operator, returning many unrelated results such as "苹果电脑" or "华为手机".
Match with Operator "and"
GET /products/_search
{
"query": {
"match": {
"name": {
"query": "苹果手机",
"operator": "and"
}
}
}
}Result: Only documents containing both "苹果" and "手机" are returned, but order is still flexible.
Match_phrase
GET /products/_search
{
"query": {
"match_phrase": {
"name": "苹果手机"
}
}
}Result: Exact phrase match with correct order, but no fuzzy capability.
n‑gram + match_phrase (Pre‑7.9)
PUT /products
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
}
}
}
} GET /products/_search
{
"query": {
"match_phrase": {
"name": "果手"
}
}
}Result: Successfully matches "苹果手机" via substring "果手".
✅ Supports substring matching anywhere.
❌ Index size grows roughly 3×.
❌ Query performance degrades.
❌ Requires careful n‑gram tuning.
Wildcard Queries (Pre‑7.9)
Legacy wildcard queries can be used on keyword fields but are risky.
GET /products/_search
{
"query": {
"wildcard": {
"name": {
"value": "*iPhone*",
"case_insensitive": true
}
}
}
}Leading wildcard (*) forces enumeration of all terms, causing high CPU and memory usage.
Wildcard Field Type (ES 7.9+)
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "wildcard"
}
}
}
} GET /products/_search
{
"query": {
"wildcard": {
"name": {
"value": "*果手*"
}
}
}
}Performance: ~25 ms latency, index size ~1.4×, low impact on the cluster.
Comparison Summary
match: Simple, low precision. match + operator "and": Better relevance, order‑independent. match_phrase: Exact phrase, order‑sensitive. n‑gram + match_phrase: Full fuzzy capability, high index cost.
Legacy wildcard: Easy to use but terrible performance.
Wildcard field type: Best for front‑and‑back fuzzy matching with good performance.
Final Recommendation
Deploy an Elasticsearch 8.x cluster.
Use the wildcard field type for fuzzy matching requirements.
Keep traditional searches with match_phrase or other mature queries.
Tip: If a product manager asks for deep pagination, remind them that even large platforms limit pages for usability.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
