How MySQL Implements File Sorting: Internals, Modes, and Optimizations
This article explains MySQL's file‑sort mechanism in depth, covering the sort buffer, handling of long sort keys, three sort modes, priority‑queue and read‑rnd‑buffer optimizations, internal vs. external sorting, descending sort implementation, and how to inspect details with optimizer trace.
Overall Overview
MySQL performs file sorting when an ORDER BY cannot be satisfied by an index. The process reads rows that match the WHERE clause into a memory area called the sort buffer, sorts them, writes sorted chunks to disk, and finally merges the chunks into a globally ordered result.
Sort Buffer (sort_buffer)
The sort buffer resides in memory and its size is controlled by the system variable sort_buffer_size (default 256 KB, range 32 KB–4 GB). When the buffer fills, its contents are sorted and written as a data block.
Handling Overly Long Sort Keys
If a sort column is defined as VARCHAR(21845) (up to 65 KB under utf8), the buffer may fill after only a few rows, causing many data blocks and heavy disk I/O. The variable max_sort_length (default 1024 bytes, range 4 bytes–8 MB) limits the number of bytes of each sort column that participate in sorting; excess bytes are ignored, which can lead to incorrect ordering when the truncated prefix is identical.
Sort Modes
MySQL supports three sort modes, each describing what is written to the sort buffer or disk files: <sort_key, additional_fields>: writes the sort key plus all fields returned by the storage engine. Every field occupies its maximum length, causing space waste. <sort_key, packed_additional_fields>: similar to the previous mode but stores only the actual length of CHAR / VARCHAR values and omits space for NULL fields (except a 1‑bit flag), reducing waste. <sort_key, rowid>: writes only the sort key and the primary‑key ID. The server must later fetch the remaining columns from the storage engine, which may double I/O but can avoid disk‑based sorting.
Mode Details
<sort_key, additional_fields> stores the sort key, a NULL‑bitmap, length bytes for CHAR / VARCHAR, and the full maximum space for each field. It is fast because the final result already contains all needed columns, but it may overflow the buffer and force disk I/O.
<sort_key, packed_additional_fields> eliminates the space waste by storing only the actual content length of variable‑length fields and no space for NULL values beyond the bitmap.
<sort_key, rowid> stores only the sort key and the row ID. After sorting, the server reads the required columns using the row IDs, turning random reads into sequential reads when combined with the read‑rnd buffer (see below).
Efficiency Boosters
Priority Queue
If the query contains a LIMIT, MySQL may replace full‑buffer sorting with a priority‑queue (a max‑heap) that keeps only the LIMIT + 1 best rows in memory. This avoids writing temporary files and reduces the amount of data that needs to be sorted.
Random‑to‑Sequential I/O (read_rnd_buffer)
When <sort_key, rowid> requires a second pass to fetch non‑key columns, MySQL buffers row IDs in read_rnd_buffer (controlled by read_rnd_buffer_size, default 256 KB). Once full, the IDs are sorted and then read sequentially from the storage engine, turning many random reads into fewer sequential reads.
Internal vs. External Sorting
Internal sorting occurs entirely in memory using one of three algorithms, chosen by data size:
Radix sort – used when total sort‑key length ≤ 20 bytes and row count is between 1 K and 100 K.
Quick sort – used for ≤ 100 rows when radix sort is not applicable.
Merge sort – fallback for all other cases.
External sorting handles data that does not fit in the buffer. Sorted chunks are written to temp_file and chunk_file. A multi‑way (typically 7‑way) merge repeatedly combines chunks until ≤ 15 chunks remain, after which a final merge produces the ordered result file ( out_file).
Merge Process Example
Assume 160 chunks. The first merge round groups them into 23 larger chunks; a second round reduces them to 4 chunks; a final merge writes the globally ordered rows to out_file. Throughout the process, two intermediate files ( temp_file and temp_file2) plus the chunk file are used.
Descending Sort
MySQL implements ORDER BY … DESC by bit‑wise inverting each byte of the sort key, then performing a normal ascending sort. After sorting, the inverted keys are discarded, yielding a descending order.
SELECT num FROM t ORDER BY num DESC;Inspecting Filesort Details with Optimizer Trace
Enable tracing:
SET optimizer_trace = "enabled=on";
SET optimizer_trace_max_mem_size = 1048576;The JSON output contains filesort_summary (rows examined, number of temporary files, chosen sort_mode) and filesort_priority_queue_optimization (whether a priority queue was used). A number_of_tmp_files of 0 means only the sort buffer was used; a positive value indicates how many times the buffer overflowed and caused a temporary file to be written.
Note: The trace must be retrieved in the same session immediately after the query; executing the query and the trace retrieval separately yields an empty result.
Summary
The article walks through MySQL's file‑sort pipeline: buffer allocation, handling of long columns, three sort modes, two major optimizations (priority queue and read‑rnd buffer), internal and external sorting algorithms, multi‑way merge mechanics, descending‑order handling, and how to expose the internal decisions via optimizer_trace. Understanding these details helps DBAs tune sort_buffer_size, max_sort_length, and related variables for optimal query performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
