Understanding MySQL Prefix Indexes and Their Optimization
This article explains the concept, creation, and performance benefits of MySQL prefix indexes, demonstrates how to choose an optimal prefix length using selectivity calculations, and presents practical techniques for handling both prefix and suffix search patterns.
MySQL prefix indexes are indexes built on the leading characters or bytes of a column, supported by most storage engines for character and binary types such as CHAR/VARCHAR, TEXT/BLOB, and BINARY/VARBINARY.
The article outlines three basic approaches to indexing repetitive string data: indexing the full column (which wastes space), splitting the column into a prefix and the remainder, or indexing a fixed-length prefix substring.
It then introduces a concrete example with table t1 that has a regular index on r1 and a prefix index on the first six characters ( r1(6) ), showing the SHOW CREATE TABLE output and comparing the tablespace sizes (26 MB vs 20 MB), demonstrating the space advantage of the prefix index.
Query performance is illustrated with several LIKE 'sample%' statements (SQL 1–SQL 6) whose execution plans all use the smaller prefix index, confirming its efficiency.
To determine the best prefix length, the article defines index selectivity and provides a MySQL stored function func_calc_prefix_length() that returns a JSON array of prefix lengths and their distinct‑value ratios. Running the function on the sample data shows that a six‑character prefix matches the overall column selectivity (0.0971), making it the optimal choice for the given queries.
The article also discusses handling suffix searches (e.g., LIKE '%sample' ), which cannot use a standard prefix index. Two optimization strategies are presented: (1) adding a separate column storing the suffix and indexing it, and (2) creating a mirrored table with reversed strings and applying a prefix index on the reversed column.
Both methods enable fast suffix queries, though the reversal approach requires additional processing at query time.
In summary, the article covers the definition, creation, space benefits, selectivity‑based prefix length determination, and practical solutions for both prefix and suffix search scenarios in MySQL.
<localhost|mysql>show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` varchar(300) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_r1` (`r1`),
KEY `idx_r1_p` (`r1`(6))
) ENGINE=InnoDB AUTO_INCREMENT=32755 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec) # idx_r1
root@debian-ytt1:/var/lib/mysql/3306/ytt# du -sh
26M .
# idx_r1_p
root@debian-ytt1:/var/lib/myzsql/3306/ytt# du -sh
20M . <localhost|mysql>select count(*) from t1 where r1 like 'sample%';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec) <localhost|mysql>explain select count(*) from t1 where r1 like 'sample%'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: range
possible_keys: idx_r1,idx_r1_p
key: idx_r1_p
key_len: 27
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index DELIMITER $$
USE `ytt`$$
DROP FUNCTION IF EXISTS `func_calc_prefix_length`$$
CREATE DEFINER=`ytt`@`%` FUNCTION `func_calc_prefix_length`() RETURNS JSON
BEGIN
DECLARE v_total_pct DECIMAL(20,4);
DECLARE v_prefix_pct DECIMAL(20,4);
DECLARE v_result JSON DEFAULT '[]';
DECLARE i TINYINT DEFAULT 1;
SELECT TRUNCATE(COUNT(DISTINCT r1) / COUNT(r1),4) INTO v_total_pct FROM t1;
label1:LOOP
SELECT TRUNCATE(COUNT(DISTINCT LEFT(r1,i)) / COUNT(r1),4) INTO v_prefix_pct FROM t1;
SET v_result = JSON_ARRAY_APPEND(v_result,'$',JSON_OBJECT(i,v_prefix_pct));
IF v_prefix_pct >= v_total_pct THEN
LEAVE label1;
END IF;
SET i = i + 1;
END LOOP;
RETURN v_result;
END$$
DELIMITER ; <localhost|mysql>SELECT func_calc_prefix_length() AS prefix_length\G
*************************** 1. row ***************************
prefix_length: [{"1": 0.0003}, {"2": 0.0005}, {"3": 0.0008}, {"4": 0.0013}, {"5": 0.0093}, {"6": 0.0971}]
1 row in set (0.32 sec) CREATE TABLE `t3` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`r1` varchar(300) DEFAULT NULL,
`suffix_r1` varchar(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_suffix_r1` (`suffix_r1`)
) ENGINE=InnoDB;
<localhost|mysql>insert into t3 select id,r1,right(r1,6) from t2; <localhost|mysql>select count(*) from t3 where suffix_r1 = 'sample';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec) <localhost|mysql>insert into t4 select id,reverse(r1) from t2; <localhost|mysql>select count(*) from t4 where r1 like 'elpmas%';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.