InnoDB Index Types, Physical Structure, Sorted Index Build, and Full‑Text Indexing
This article explains InnoDB clustered and secondary indexes, their physical B‑tree storage, page‑size and fill‑factor settings, the three‑phase sorted index build process, and the design, tables, cache, document‑ID handling, transaction semantics, and monitoring of InnoDB full‑text indexes.
InnoDB tables always have a special clustered index that stores the row data; by default the primary key is used as the clustered index, and if no explicit PRIMARY KEY exists InnoDB creates a hidden clustered index using a synthetic column (GEN_CLUST_INDEX) containing a monotonically increasing 6‑byte row ID.
Secondary (auxiliary) indexes store the primary‑key columns together with the indexed columns. Because each secondary index entry includes the primary key, a short primary key improves space efficiency and lookup speed.
All InnoDB indexes (except spatial R‑tree indexes) are implemented as B‑trees with a default page size of 16 KB, configurable via innodb_page_size . When inserting into a clustered index, InnoDB reserves roughly 1/16 of each page for future growth; page fill levels depend on insertion order.
During index creation or rebuild, InnoDB performs a bulk‑load called sorted index build. The process has three stages: (1) scanning the clustered index and writing sorted entries to a temporary buffer, (2) merging the temporary files, and (3) inserting the merged entries into the B‑tree using a multithreaded algorithm that always appends to the right‑most leaf page.
The innodb_fill_factor variable controls how much free space is left on each B‑tree page during sorted index build (default 100 % leaves 1/16 free). If a page’s fill level drops below MERGE_THRESHOLD (default 50 %), InnoDB attempts to merge pages to reclaim space.
InnoDB full‑text indexes use an inverted‑index design. Creating a full‑text index on CHAR, VARCHAR, or TEXT columns generates a set of auxiliary tables named fts_…_index_# that store word lists, document IDs (DOC_ID), and position offsets. The tables are partitioned into six auxiliary index tables for parallel indexing, controlled by innodb_ft_sort_pll_degree .
A hidden FTS_DOC_ID column (BIGINT UNSIGNED NOT NULL) is added to the indexed table if not supplied, and a unique index FTS_DOC_ID_INDEX is created automatically. This column maps each document to its entries in the inverted index.
During inserts, InnoDB caches recent tokenized rows in the full‑text index cache ( INNODB_FT_INDEX_CACHE ) until the cache fills, then flushes the data in bulk to the auxiliary tables. Cache size is configurable with innodb_ft_cache_size (per‑table) and innodb_ft_total_cache_size (global).
Deletion of rows writes the corresponding DOC_ID values to special tables ( fts_*_deleted , fts_*_being_deleted ) rather than immediately removing entries from the auxiliary indexes, allowing fast deletes. To reclaim space, run OPTIMIZE TABLE … innodb_optimize_fulltext_only=ON .
Full‑text index updates are transaction‑aware: changes become visible to full‑text search only after the transaction commits. This behavior is demonstrated by inserting rows inside a transaction, querying with MATCH() ... AGAINST() before commit (returns 0), and after commit (returns the expected count).
Monitoring of InnoDB full‑text indexes can be performed via INFORMATION_SCHEMA tables such as INNODB_FT_CONFIG , INNODB_FT_INDEX_TABLE , INNODB_FT_INDEX_CACHE , INNODB_FT_DELETED , and INNODB_FT_BEING_DELETED , as well as INNODB_INDEXES and INNODB_TABLES for general index metadata.
Cognitive Technology Team
Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.