Understanding MySQL InnoDB Full‑Text Search and Inverted Index
This article explains why InnoDB full‑text search is needed for keyword‑based queries, introduces inverted index structures, shows how to create, use, and delete full‑text indexes in MySQL, and demonstrates natural language, boolean, and query‑expansion search modes with practical SQL examples.
We all know that using "%xx" in InnoDB fuzzy queries disables indexes, but many scenarios such as search engines or e‑commerce sites require keyword‑based full‑text search, which B+Tree indexes cannot handle efficiently.
While most queries can be satisfied with numeric comparisons and range filters, keyword matching needs similarity‑based queries, and full‑text indexes are designed for this purpose.
Full‑text search (Full‑Text Search) retrieves any information from whole books or articles stored in a database, allowing extraction of chapters, sections, sentences, words, and supporting various statistics and analyses.
In early MySQL versions InnoDB did not support full‑text search; support was added starting from MySQL 5.6.
Inverted Index
Full‑text search typically uses an inverted index, which, like a B+Tree, is an index structure that stores a mapping between words and the documents (and positions) where they appear. It has two forms:
inverted file index: {word, document‑id list}
full inverted index: {word, (document‑id, position) list}
The diagram above shows an inverted file index where the word "code" appears in documents 1 and 4, enabling simple retrieval of matching documents. The full inverted index also stores the exact positions of the word, which consumes more space but allows finer‑grained queries.
Full‑Text Search
Create Full‑Text Index
1. Create a table with a full‑text index:
CREATE TABLE table_name (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
author VARCHAR(200),
title VARCHAR(200),
content TEXT(500),
FULLTEXT full_index_name (col_name)
) ENGINE=InnoDB;Enter a query statement (example):
SELECT table_id, name, space FROM INFORMATION_SCHEMA.INNODB_TABLES
WHERE name LIKE 'test/%';The six auxiliary index tables together form the inverted index. When a document is tokenized, each term and its position are stored, sorted by the first character's weight, and partitioned across the tables.
2. Add a full‑text index to an existing table:
CREATE FULLTEXT INDEX full_index_name ON table_name(col_name);Use Full‑Text Index
MySQL supports full‑text queries on InnoDB or MyISAM tables, and only on CHAR, VARCHAR, or TEXT columns.
The syntax is:
MATCH(col1, col2, ...) AGAINST(expr[search_modifier])
search_modifier:
IN NATURAL LANGUAGE MODE
| IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
| IN BOOLEAN MODE
| WITH QUERY EXPANSIONFull‑text search uses MATCH() to list the columns to search and AGAINST() to provide the search string and optional modifier. Three search modes are available: natural language, boolean, and query expansion.
Natural Language
Natural language mode treats the query as a human phrase; it is the default mode and returns documents containing the specified keywords.
Demo:
SELECT count(*) AS count
FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('MySQL');The same query can be written as:
SELECT count(IF(MATCH (title, body) AGAINST ('MySQL'), 1, NULL)) AS count
FROM `fts_articles`;The second form is faster because it avoids relevance sorting.
You can also retrieve relevance scores:
SELECT *, MATCH (title, body) AGAINST ('MySQL') AS Relevance
FROM fts_articles;Relevance depends on four factors: word presence, frequency, document frequency, and the number of documents containing the word.
For InnoDB, additional considerations include stopwords and token length limits (default min = 3, max = 84).
Boolean
Boolean mode allows special operators to control required, prohibited, or weighted terms. Examples: +: word must be present -: word must be absent
(no operator): optional word, higher relevance if present @distance: proximity search, e.g.,
MATCH(context) AGAINST('"Pease hot"@30' IN BOOLEAN MODE) ~: optional word with negative weight <: lower relevance *: wildcard prefix, e.g.,
lik* ": exact phrase
Demo 1 (+ -):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);Demo 2 (no operator):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('MySQL IBM' IN BOOLEAN MODE);Demo 3 (proximity):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('"DB2 IBM"@3' IN BOOLEAN MODE);Demo 4 (>, <):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('+MySQL +(>database <DBMS)' IN BOOLEAN MODE);Demo 5 (~):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('MySQL ~database' IN BOOLEAN MODE);Demo 6 (wildcard):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('My*' IN BOOLEAN MODE);Demo 7 (exact phrase):
SELECT * FROM `fts_articles`
WHERE MATCH (title, body) AGAINST ('"MySQL Security"' IN BOOLEAN MODE);Query Expansion
Query expansion modifies natural language search to include related terms when the original keyword is too short, providing implicit knowledge (e.g., searching "database" also matches "MySQL", "Oracle", "RDBMS"). It is enabled with WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION and works in two stages: initial full‑text search, then a second search using terms extracted from the first result set.
Example:
-- Create index
CREATE FULLTEXT INDEX title_body_index ON fts_articles(title, body);
-- Natural language query
SELECT * FROM `fts_articles` WHERE MATCH(title, body) AGAINST ('database');
-- With query expansion
SELECT * FROM `fts_articles` WHERE MATCH(title, body) AGAINST ('database' WITH QUERY EXPANSION);Query expansion can return many irrelevant results, so it should be used cautiously.
Delete Full‑Text Index
1. Drop index directly: DROP INDEX full_idx_name ON db_name.table_name; 2. Drop index using ALTER TABLE:
ALTER TABLE db_name.table_name DROP INDEX full_idx_name;Summary
This article combined theory and practice to introduce MySQL full‑text indexes, covering inverted index structures, creation, usage, various search modes, query expansion, and index removal.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
