Inside MySQL Record Buffer: How Server and Storage Engine Exchange Data
This article explains MySQL's two‑layer storage engine architecture, details how the server layer prepares record metadata, and walks through a concrete InnoDB table example to show the exact memory layout and data‑exchange process between the server and storage engine.
MySQL storage engines are implemented as plugins and consist of two layers: a server layer and a storage‑engine layer. The server layer parses SQL, selects execution plans, and performs logical operations such as filtering, sorting, and grouping, while the engine layer is responsible solely for reading and writing data.
1. Principle Explanation
In the source code each table is represented by a TABLE object that contains a record array with two elements. When the server calls an engine method to read a row, it passes the address of record[0] to the engine. The engine reads the raw bytes from disk or memory, converts them to the server’s internal format, and writes the result into the memory area pointed to by record[0], enabling the server to apply WHERE filtering, grouping, sorting, etc.
The interaction looks simple because the server has performed extensive preparation work beforehand.
2. Preparation Phase
When a table is created MySQL computes, for each column, the Offset inside a record and the maximum record length (including space for variable‑length columns). On the first query the server reads the .frm file, obtains the column offsets and the maximum record length, and allocates the two record pointers so that each occupies the maximum length.
Each column is represented by a Field subclass that holds a ptr pointing to the location of that column inside record[0]. For variable‑length columns the Field also stores the length of the content.
When the engine reads a column it checks the column type. Fixed‑length columns are converted and written directly to ptr. Variable‑length columns first write the length, then the actual data immediately after the length field.
3. Example Analysis
The following table is used as an example:
CREATE TABLE `t_recbuf` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`i1` int(10) unsigned DEFAULT '0',
`str1` varchar(32) DEFAULT '',
`str2` varchar(255) DEFAULT '',
`c1` char(11) DEFAULT '',
`e1` enum('北京','上海','广州','深圳') DEFAULT '北京',
`s1` set('吃','喝','玩','乐') DEFAULT '',
`bit1` bit(8) DEFAULT b'0',
`bit2` bit(17) DEFAULT b'0',
`blob1` blob,
`d1` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;The memory layout of record[0] is illustrated below:
NULL‑value bitmap occupies the first bytes of the record. Each column that is not declared NOT NULL gets one bit; in the example ten columns need 10 bits, rounded up to 2 bytes.
id – 4‑byte int, fixed length. Offset = 2 (size of NULL bitmap). The engine converts the big‑endian value stored by InnoDB to the little‑endian format expected by the server before writing it to ptr.
i1 – 4‑byte int, Offset = 6 (2 + 4). Same conversion applies.
str1 – varchar(32). Maximum 96 bytes (utf8, up to 3 bytes per character). Length field occupies 1 byte, so the data offset is 11 (10 + 1). The engine writes the length byte followed by the string bytes.
str2 – varchar(255). Maximum 765 bytes, length field occupies 2 bytes. Data offset = 109 (107 + 2). The engine writes a 2‑byte length then the string.
c1 – char(11). Fixed 33 bytes (11 × 3). Offset = 874. If the stored string is shorter than 33 bytes, the engine pads the remaining space with spaces.
e1 – enum with four values, stored as 1 byte. Offset = 907. The engine performs endian conversion before writing.
s1 – set with up to 4 values, stored as 1 byte (up to 8 bytes for 64 possible options). Offset = 908.
bit1 – bit(8) = 1 byte. Offset = 909. Stored as a C‑style char byte.
bit2 – bit(17) occupies 3 bytes. Offset = 910. When the bit count is not a multiple of 8, InnoDB pads the most‑significant bits with zeros.
blob1 – blob (up to 64 KB). Offset = 913. The engine stores a 2‑byte length, then an 8‑byte pointer to the actual blob data; the blob content itself is not copied into record[0].
d1 – decimal(10,2), fixed length calculated at table creation. Offset = 923. The value is stored in binary form and written directly to ptr.
In InnoDB the data is stored in big‑endian order, while the server reads it in little‑endian order, so numeric fields require endian conversion during the transfer.
Bit fields are stored as char bytes; when the bit length is not a multiple of 8, the extra bits still occupy a full byte.
Enum and set columns are implemented as integers; their storage length depends on the number of defined options (1–8 bytes).
Text and JSON types are internally implemented using the corresponding blob types (tinyblob, blob, mediumblob, longblob).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
