Understanding MySQL InnoDB Full-Text Search and Inverted Index
This article explains why traditional B‑Tree indexes fail for keyword searches, introduces the concept of inverted indexes, shows how to create and use MySQL InnoDB full‑text indexes with MATCH‑AGAINST in various modes, and covers index maintenance and query‑expansion techniques.
Traditional B‑Tree indexes become ineffective when using wildcard patterns like "%xx" for fuzzy searches, which are common in search engines and e‑commerce platforms that need to match keywords within large text fields. Full‑text search, based on similarity rather than exact value comparison, solves this problem.
Inverted Index
Full‑text search relies on an inverted index, a structure similar to B‑Tree that maps each word to the documents (and optionally positions) where it appears. Two forms are commonly used:
inverted file index: {word, list of document IDs}
full inverted index: {word, (document ID, position)}
The first form stores only document identifiers, while the second also records the exact position of the word, enabling more precise queries at the cost of additional storage.
The diagram above illustrates an inverted file index where the term "code" appears in documents 1 and 4. A full inverted index would additionally store the positions (e.g., word 6 in doc 1, word 8 in doc 4).
Creating a Full‑Text Index
When creating a table
1. Create table with a full‑text index:
CREATE TABLE table_name (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
author VARCHAR(200),
title VARCHAR(200),
content TEXT(500),
FULLTEXT full_index_name (col_name)
) ENGINE=InnoDB;Example query to list InnoDB tables:
SELECT table_id, name, space FROM INFORMATION_SCHEMA.INNODB_TABLES
WHERE name LIKE 'test/%';On an existing table
2. Add a full‑text index:
CREATE FULLTEXT INDEX full_index_name ON table_name(col_name);Using a Full‑Text Index
MySQL supports full‑text search on InnoDB and MyISAM tables for CHAR , VARCHAR , and TEXT columns. The basic syntax is:
MATCH(col1, col2, ...) AGAINST(expr[search_modifier])
search_modifier:
IN NATURAL LANGUAGE MODE
| IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
| IN BOOLEAN MODE
| WITH QUERY EXPANSIONThe MATCH() function lists the columns to search, while AGAINST() provides the search string and optional modifiers.
Natural Language Mode
Interprets the query as a natural language phrase. Example:
SELECT count(*) AS count
FROM fts_articles
WHERE MATCH(title, body) AGAINST('MySQL');Counting rows that contain the word "MySQL".
Boolean Mode
Allows operators to control required, prohibited, or optional terms, as well as relevance weighting. Common operators:
+ : term must be present
- : term must be absent
(no operator): term is optional but boosts relevance if present
@distance : proximity search, e.g., MATCH(context) AGAINST('"Pease hot"@30' IN BOOLEAN MODE)
< / > : lower or raise relevance
~ : term is allowed but reduces relevance
" : exact phrase
lik* : prefix matching
Example queries:
SELECT * FROM fts_articles
WHERE MATCH(title, body) AGAINST('+MySQL -YourSQL' IN BOOLEAN MODE);Find rows containing "MySQL" but not "YourSQL".
SELECT * FROM fts_articles
WHERE MATCH(title, body) AGAINST('MySQL IBM' IN BOOLEAN MODE);Both terms are optional; presence increases relevance.
SELECT * FROM fts_articles
WHERE MATCH(title, body) AGAINST('"MySQL Security"' IN BOOLEAN MODE);Exact phrase search.
Query Expansion
Extends the original query with related terms (automatic relevance feedback). Use WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION . The process runs in two stages: first a normal full‑text search, then a second search using terms extracted from the first result set.
SELECT * FROM fts_articles
WHERE MATCH(title, body) AGAINST('database' WITH QUERY EXPANSION);Deleting a Full‑Text Index
Two ways to drop a full‑text index:
DROP INDEX full_idx_name ON db_name.table_name;or
ALTER TABLE db_name.table_name DROP INDEX full_idx_name;Conclusion
This article combined theory and practice to introduce MySQL InnoDB full‑text indexes, covering inverted index structures, creation, usage in different search modes, query‑expansion, and index removal.
Author: 沸羊羊
Original link: juejin.cn/post/6989871497040887845
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.