Databases 17 min read

InnoDB Full-Text Index Architecture and Operations in MySQL 5.7

This article explains MySQL InnoDB full-text indexing, covering supported modes, auxiliary tables, creation, DML handling, query processing, transaction management, cache synchronization, optimization, background threads, monitoring, stopwords, plugins, and the built‑in n‑gram parser, with code references from MySQL 5.7.

Architect
Architect
Architect
InnoDB Full-Text Index Architecture and Operations in MySQL 5.7

Introduction

Since MySQL 5.6, the InnoDB engine supports full‑text indexes with syntax largely compatible with the older MyISAM full‑text mode. Full‑text indexing builds an inverted index to enable fast document matching. MySQL provides three search modes: natural language ( IN NATURAL LANGUAGE MODE), boolean ( IN BOOLEAN MODE) with operators such as "+" (must contain), "-" (must not contain), "*" (wildcard), and query expansion ( WITH QUERY EXPANSION), which runs a second search using the most relevant rows from the first pass. Full‑text indexes can be defined on CHAR, VARCHAR, and TEXT columns.

This article gives a brief analysis of the code modules involved in full‑text indexing and highlights new features in MySQL 5.7, based on the source of MySQL 5.7.8‑rc.

Creating a Full‑Text Index

A simple example of creating a table with a full‑text index is shown below (image omitted). On disk MySQL creates several auxiliary .ibd files, for example FTS_000000000000010b_0000000000000154_INDEX_1~6.ibd, which store the inverted index, token positions, and document IDs, partitioned by the first character of each token.

The naming convention for these files is FTS_{TABLE_ID}_{INDEX_ID}_INDEX_{N}.ibd. Additional auxiliary tables include: FTS_..._DELETED.ibd – stores document IDs that have been deleted but not yet removed from the index. FTS_..._DELETED_CACHE.ibd – in‑memory cache for the deleted IDs. FTS_..._BEING_DELETED.ibd and FTS_..._BEING_DELETED_CACHE.ibd – hold IDs that are in the process of being removed, used during OPTIMIZE TABLE. FTS_..._CONFIG.ibd – stores internal index metadata such as FTS_SYNCED_DOC_ID, which is used for crash recovery.

When creating a full‑text index on an existing table, InnoDB spawns multiple threads; the degree of parallelism is controlled by the parameter innodb_ft_sort_pll_degree. It is recommended to create the table and load data first, then add the full‑text index.

Key functions involved are row_merge_read_clustered_indexrow_fts_start_psort and the thread callback fts_parallel_tokenization.

Ordinary DML and Query Operations

Insert

During an INSERT, InnoDB checks INNODB_FT_INDEX_CACHE to see if the table has a full‑text index ( DICT_TF2_FTS). If so, the document ID is extracted via fts_get_doc_id_from_row and cached in the transaction object.

Delete

Delete statements do not immediately remove entries from the full‑text index; the token information remains accessible through INNODB_FT_INDEX_CACHE. The deleted document IDs are recorded in the auxiliary DELETED tables.

Update

Updating a non‑indexed column leaves the hidden FTS_DOC_ID unchanged. Updating an indexed column causes the old document to be deleted and a new one inserted, using the same internal mechanisms as INSERT and DELETE.

All DML operations funnel through fts_trx_add_op, which classifies the operation as FTS_INSERT or FTS_DELETE. Document IDs are stored in trx->fts_trx and applied to the index only at transaction commit time.

Query

Full‑text queries are performed in two steps: first, MySQL collects matching document IDs based on the search terms; second, it retrieves the corresponding rows using the hidden unique index dict_table_t::fts_doc_id_index built on FTS_DOC_ID. Results are ranked using a BM25/TF‑IDF algorithm similar to Sphinx. The ranking calculation uses functions fts_query_calculate_idf and fts_query_calculate_ranking.

Transaction Operations

During a rollback, InnoDB removes the document IDs recorded in trx->fts_trx->last_stmt from the global savepoints. On commit, fts_add_doc_by_id processes each cached document: it tokenizes the text, adds tokens to the in‑memory cache, and flushes the cache to disk when innodb_ft_cache_size or innodb_ft_total_cache_size thresholds are exceeded.

Deletion commits insert the removed document IDs into the DELETED auxiliary tables via fts_delete. Key classes involved in transaction handling are illustrated in the original diagram (image omitted).

Cache Synchronization

Synchronization (sync) writes cached data to the on‑disk full‑text index files. It is triggered when the cache exceeds memory limits, when the background fts_optimize_thread shuts down, during OPTIMIZE TABLE, or after a temporary table is built and read.

The sync process calls fts_sync_index to write index nodes ( fts_node_t) and then fts_sync_commit to update the CONFIG table, move deleted IDs to DELETED_CACHE, and reset the cache.

OPTIMIZE TABLE

When the parameter innodb_optimize_fulltext_only is enabled, OPTIMIZE TABLE rebuilds only the full‑text index. The operation first calls fts_sync_table to flush caches, then fts_optimize_table, which snapshots deleted IDs, reads them, and optimizes each index. After optimization, snapshots are purged and auxiliary tables are cleaned.

Background Thread

InnoDB starts a background thread fts_optimize_thread with a work queue fts_optimize_wq. It receives messages of type FTS_MSG_ADD_TABLE (when a new full‑text table is created) and FTS_MSG_DEL_TABLE (when a table is dropped). When a table accumulates more than ten million deleted rows, the thread triggers an automatic optimize, respecting a minimum interval of 300 seconds.

Monitoring

Full‑text index status can be monitored via INFORMATION_SCHEMA tables. The variable innodb_ft_aux_table must be set to "dbname/tablename" before querying.

Stopwords

Stopwords are ignored during tokenization. The built‑in stopword list is stored in INNODB_FT_DEFAULT_STOPWORD. Users can provide a custom stopword table by creating a table with the same structure and setting innodb_ft_server_stopword_table (or the session variable innodb_ft_user_stopword_table) to its name.

Token length limits are controlled by innodb_ft_min_token_size and innodb_ft_max_token_size; when using the n‑gram parser, ngram_token_size determines the effective token length. The flag innodb_ft_enable_stopword can be disabled to ignore stopwords entirely.

Full‑Text Index Plugins

Since MySQL 5.7.3, InnoDB supports full‑text index plugins. A plugin can replace the built‑in parser or act as a collaborator that preprocesses the document before the built‑in parser runs. Existing MyISAM plugins can be adapted for InnoDB.

InnoDB N‑gram Parser

MySQL 5.7.6 introduced a built‑in n‑gram parser to better handle CJK (Chinese, Korean, Japanese) character sets, where words are not delimited by spaces. The parser is activated with WITH PARSER ngram. For a token size of 2 (default), the word "abcd" is tokenized into "ab", "bc", and "cd". Unlike the standard parser, any token containing a stopword is excluded, and spaces are always treated as stopwords.

During queries, the search terms are also tokenized by the n‑gram parser before matching. MySQL also supports a MeCab parser plugin for Japanese segmentation.

Source: Yunqi Academy Blog (original URL: https://yq.aliyun.com/articles/224#). © 2026 Yunqi Academy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationindexingdatabaseInnoDBmysqlFull‑Text Search
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.