Databases 8 min read

In-Depth Exploration of InnoDB Bulk Index Build

This article provides a comprehensive technical overview of InnoDB's bulk index build feature introduced in MySQL 5.7, detailing its design, fast index build phases, bottom‑up construction, dirty‑page flushing, compressed‑table handling, fill‑factor configuration, and potential future optimizations.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
In-Depth Exploration of InnoDB Bulk Index Build

InnoDB introduced a bulk index build capability in MySQL 5.7, designed and implemented by the author of WL#7277. The article explains the motivation behind the feature, contrasting it with the traditional per‑row index insertion that incurs heavy locking, redo/undo logging, and slow recovery.

Fast Index Build splits the indexing process into three stages: (1) Read phase , where the clustered index is scanned once and records for the new index are written to a sort buffer (default 1 GB) and sorted into temporary files; (2) Sort phase , which performs external two‑way merge sort on the temporary files, detecting duplicate keys for unique indexes; (3) Build phase , where sorted records are inserted into the B‑tree without node locking or undo logging, always appending to the rightmost leaf, resulting in much higher throughput.

The implementation includes a detail that prevents unnecessary node splits: when a child node cannot accept a new record at the same position as the previous insertion and the next record is the maximum marker, the node is not split but a new node is allocated and the record becomes its first entry, keeping leaf nodes mostly full.

Bottom‑Up Index Build inserts records first into leaf pages; when a leaf fills, the record propagates upward. This approach avoids repeated root‑to‑leaf traversals by keeping the current insert page and position per level, batches inserts per page within a mini‑transaction, and omits redo logging for data pages while still logging page allocation for crash recovery.

For dirty‑page flushing, the design creates a flush observer object that tracks dirty pages related to the index build. The observer increments a counter when a page is marked dirty and decrements it after the page is written, allowing the index creation to wait only for its own pages rather than a full checkpoint.

Compressed tables are handled by inserting records only into uncompressed pages during bulk build; once a page fills, it is compressed, and if the resulting size is below the target, the page is split.

A configurable global parameter innodb_fill_factor (range 10‑100) controls the fill percentage of data pages, preserving free space to reduce future page splits. Compressed pages retain their adaptive padding mechanism.

Additional design considerations included a direct‑page‑allocation path (later abandoned) and the decision to omit redo logging for bulk builds to avoid long crash‑recovery times.

The article concludes with suggestions for further improvements, such as per‑table fill‑factor settings, online DDL log handling without redo, leveraging bulk index build for empty‑table LOAD DATA, logical read‑ahead for the first phase, and multi‑way merge or multithreaded sorting for the second phase.

InnoDBMySQLDatabase PerformanceBulk Index BuildFast Index BuildFill Factor
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.