Understanding InnoDB Storage Engine: Page Size, Row Format, and Varchar Limits
This article explains how InnoDB stores data on disk, the role of 16 KB pages in read/write operations, the dynamic row format’s length and NULL lists, overflow handling, and why varchar columns are limited to 16 383 characters under utf8mb4, illustrated with SQL examples.
InnoDB is the storage engine that writes table data to disk. When a query accesses data, InnoDB first loads the required pages (default 16 KB) from disk into memory; subsequent reads can be served from memory, which explains the slower first‑page latency often observed in pagination tests.
The engine divides data into pages because disk I/O is orders of magnitude slower than memory access. Each page is the basic unit of transfer between disk and buffer pool, and the engine reads or writes whole pages even if only a few rows are needed.
Rows are stored using a dynamic row format by default in MySQL 5+ (the article also mentions compact and redundant formats). In this format, each variable‑length column (e.g., VARCHAR , TEXT , BLOB ) has a length field stored in a variable‑length field list . The length field occupies 1 byte if the maximum possible byte length (M × W) ≤ 255, otherwise 2 bytes. The actual length L of the stored value is recorded, allowing InnoDB to know how many bytes to read.
For VARCHAR(M) , M denotes the maximum number of characters, not bytes. With the utf8mb4 charset (W = 4 bytes per character), the maximum byte length a VARCHAR can record is 65 535 bytes (2^16‑1). Dividing by 4 gives a practical character limit of 16 383. Therefore VARCHAR(20000) is invalid; the engine caps it at 16 383 characters.
In addition to the length list, InnoDB stores a NULL‑value list for columns that allow NULL. Each nullable column gets one bit; the bits are packed into whole bytes, with unused high bits set to 0. If a row has no nullable columns, the NULL list is omitted.
When a column’s data exceeds the space that can be stored directly in the row, InnoDB creates an overflow column . Only a 20‑byte pointer is kept in the row, while the actual data is stored in separate overflow pages linked as a singly‑linked list.
Example definitions used in the article:
SHOW VARIABLES LIKE "innodb_default_row_format"; CREATE TABLE test (
c1 VARCHAR(10),
c2 VARCHAR(10) NOT NULL,
c3 CHAR(10),
c4 VARCHAR(10)
) CHARSET=utf8mb4; INSERT INTO test (c1, c2, c3, c4)
VALUES ('aaaa', '你好啊', 'cc', 'd'),
('eeee', 'fff', NULL, NULL); SELECT LENGTH(c2) FROM test WHERE c1='aaaa'; CREATE TABLE t2 (name VARCHAR(16383)) CHARSET=utf8mb4;The article also provides practical rules: if M × W ≤ 255, the length list uses 1 byte; otherwise, 2 bytes are used, and the actual stored length L determines whether 1 or 2 bytes are needed (≤ 127 bytes → 1 byte, > 127 bytes → 2 bytes). These rules explain why short VARCHAR values consume less space.
Finally, the author notes that the dynamic format is the most relevant for everyday development, and suggests reading "How MySQL Works" for deeper details on other row formats.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.