How to Achieve MySQL‑LIKE Style Fuzzy Search in Elasticsearch 8.x

This article walks through the challenge of implementing MySQL‑LIKE style front‑and‑back wildcard searches in Elasticsearch, comparing match, match_phrase, n‑gram, legacy wildcard queries, and the new wildcard field type introduced in ES 7.9+, with code samples, performance benchmarks, and practical recommendations for choosing the optimal solution.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
How to Achieve MySQL‑LIKE Style Fuzzy Search in Elasticsearch 8.x

Introduction

When a product manager demanded a MySQL‑LIKE "LIKE '%keyword%'" style fuzzy search, the Elasticsearch team realized a deeper technical challenge.

image.png
image.png

Tokenization Basics

Understanding Elasticsearch’s core concept of tokenization is essential for fuzzy search.

Tokenization Example

原始文本:"苹果手机真香"
分词结果:["苹果", "手机", "真", "香"]

Match Query and Its Limitation

GET /products/_search
{
  "query": {
    "match": {
      "name": "苹果手机"
    }
  }
}

Problem: The default match query uses the or operator, returning many unrelated results such as "苹果电脑" or "华为手机".

Match with Operator "and"

GET /products/_search
{
  "query": {
    "match": {
      "name": {
        "query": "苹果手机",
        "operator": "and"
      }
    }
  }
}

Result: Only documents containing both "苹果" and "手机" are returned, but order is still flexible.

Match_phrase

GET /products/_search
{
  "query": {
    "match_phrase": {
      "name": "苹果手机"
    }
  }
}

Result: Exact phrase match with correct order, but no fuzzy capability.

n‑gram + match_phrase (Pre‑7.9)

PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ngram_analyzer": {
          "tokenizer": "ngram_tokenizer"
        }
      },
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ngram_analyzer",
        "search_analyzer": "standard"
      }
    }
  }
}
GET /products/_search
{
  "query": {
    "match_phrase": {
      "name": "果手"
    }
  }
}

Result: Successfully matches "苹果手机" via substring "果手".

✅ Supports substring matching anywhere.

❌ Index size grows roughly 3×.

❌ Query performance degrades.

❌ Requires careful n‑gram tuning.

Wildcard Queries (Pre‑7.9)

Legacy wildcard queries can be used on keyword fields but are risky.

GET /products/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "*iPhone*",
        "case_insensitive": true
      }
    }
  }
}

Leading wildcard (*) forces enumeration of all terms, causing high CPU and memory usage.

Wildcard Field Type (ES 7.9+)

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "wildcard"
      }
    }
  }
}
GET /products/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "*果手*"
      }
    }
  }
}

Performance: ~25 ms latency, index size ~1.4×, low impact on the cluster.

Comparison Summary

match

: Simple, low precision. match + operator "and": Better relevance, order‑independent. match_phrase: Exact phrase, order‑sensitive. n‑gram + match_phrase: Full fuzzy capability, high index cost.

Legacy wildcard: Easy to use but terrible performance.

Wildcard field type: Best for front‑and‑back fuzzy matching with good performance.

Final Recommendation

Deploy an Elasticsearch 8.x cluster.

Use the wildcard field type for fuzzy matching requirements.

Keep traditional searches with match_phrase or other mature queries.

Tip: If a product manager asks for deep pagination, remind them that even large platforms limit pages for usability.
performanceElasticsearchfuzzy-searchn-gram
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.