Databases 12 min read

Understanding MySQL Prefix Indexes and Their Optimization

This article explains the concept, creation, and performance benefits of MySQL prefix indexes, demonstrates how to choose an optimal prefix length using selectivity calculations, and presents practical techniques for handling both prefix and suffix search patterns.

Aikesheng Open Source Community

Jan 20, 2021

Understanding MySQL Prefix Indexes and Their Optimization

MySQL prefix indexes are indexes built on the leading characters or bytes of a column, supported by most storage engines for character and binary types such as CHAR/VARCHAR, TEXT/BLOB, and BINARY/VARBINARY.

The article outlines three basic approaches to indexing repetitive string data: indexing the full column (which wastes space), splitting the column into a prefix and the remainder, or indexing a fixed-length prefix substring.

It then introduces a concrete example with table t1 that has a regular index on r1 and a prefix index on the first six characters ( r1(6)), showing the SHOW CREATE TABLE output and comparing the tablespace sizes (26 MB vs 20 MB), demonstrating the space advantage of the prefix index.

Query performance is illustrated with several LIKE 'sample%' statements (SQL 1–SQL 6) whose execution plans all use the smaller prefix index, confirming its efficiency.

To determine the best prefix length, the article defines index selectivity and provides a MySQL stored function func_calc_prefix_length() that returns a JSON array of prefix lengths and their distinct‑value ratios. Running the function on the sample data shows that a six‑character prefix matches the overall column selectivity (0.0971), making it the optimal choice for the given queries.

The article also discusses handling suffix searches (e.g., LIKE '%sample'), which cannot use a standard prefix index. Two optimization strategies are presented: (1) adding a separate column storing the suffix and indexing it, and (2) creating a mirrored table with reversed strings and applying a prefix index on the reversed column.

Both methods enable fast suffix queries, though the reversal approach requires additional processing at query time.

In summary, the article covers the definition, creation, space benefits, selectivity‑based prefix length determination, and practical solutions for both prefix and suffix search scenarios in MySQL.

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><localhost|mysql>show create table t1\G<br/>*************************** 1. row ***************************<br/>       Table: t1<br/>Create Table: CREATE TABLE `t1` (<br/>  `id` bigint unsigned NOT NULL AUTO_INCREMENT,<br/>  `r1` varchar(300) DEFAULT NULL,<br/>  PRIMARY KEY (`id`),<br/>  KEY `idx_r1` (`r1`),<br/>  KEY `idx_r1_p` (`r1`(6))<br/>) ENGINE=InnoDB AUTO_INCREMENT=32755 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci<br/>1 row in set (0.00 sec)</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"># idx_r1<br/>root@debian-ytt1:/var/lib/mysql/3306/ytt# du -sh<br/>26M .<br/># idx_r1_p<br/>root@debian-ytt1:/var/lib/myzsql/3306/ytt# du -sh<br/>20M .</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><localhost|mysql>select count(*) from t1 where r1 like 'sample%';<br/>+----------+<br/>| count(*) |<br/>+----------+<br/>|        4 |<br/>+----------+<br/>1 row in set (0.00 sec)</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><localhost|mysql>explain select count(*) from t1 where r1 like 'sample%'\G<br/>*************************** 1. row ***************************<br/>           id: 1<br/>  select_type: SIMPLE<br/>        table: t1<br/>   partitions: NULL<br/>         type: range<br/>possible_keys: idx_r1,idx_r1_p<br/>          key: idx_r1_p<br/>      key_len: 27<br/>          ref: NULL<br/>         rows: 4<br/>     filtered: 100.00<br/>        Extra: Using where; Using index</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">DELIMITER $$<br/><br/>USE `ytt`$$<br/><br/>DROP FUNCTION IF EXISTS `func_calc_prefix_length`$$<br/><br/>CREATE DEFINER=`ytt`@`%` FUNCTION `func_calc_prefix_length`() RETURNS JSON<br/>BEGIN<br/>      DECLARE v_total_pct DECIMAL(20,4);<br/>      DECLARE v_prefix_pct DECIMAL(20,4);<br/>      DECLARE v_result JSON DEFAULT '[]';<br/>      DECLARE i TINYINT DEFAULT 1;<br/>  <br/>  SELECT TRUNCATE(COUNT(DISTINCT r1) / COUNT(r1),4) INTO v_total_pct FROM t1;<br/>  label1:LOOP<br/>    SELECT TRUNCATE(COUNT(DISTINCT LEFT(r1,i)) / COUNT(r1),4) INTO v_prefix_pct FROM t1; <br/>    SET v_result = JSON_ARRAY_APPEND(v_result,'$',JSON_OBJECT(i,v_prefix_pct));       <br/>    IF v_prefix_pct >= v_total_pct THEN<br/>      LEAVE label1;        <br/>    END IF;        <br/>    SET i = i + 1;<br/>  END LOOP;<br/>  RETURN v_result;<br/>END$$<br/>DELIMITER ;</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><localhost|mysql>SELECT func_calc_prefix_length() AS prefix_length\G<br/>*************************** 1. row ***************************<br/>prefix_length: [{"1": 0.0003}, {"2": 0.0005}, {"3": 0.0008}, {"4": 0.0013}, {"5": 0.0093}, {"6": 0.0971}]<br/>1 row in set (0.32 sec)</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">CREATE TABLE `t3` (<br/>  `id` bigint unsigned NOT NULL AUTO_INCREMENT,<br/>  `r1` varchar(300) DEFAULT NULL,<br/>  `suffix_r1` varchar(6) DEFAULT NULL,<br/>  PRIMARY KEY (`id`),<br/>  KEY `idx_suffix_r1` (`suffix_r1`)<br/>) ENGINE=InnoDB;<br/><br/><localhost|mysql>insert into t3 select id,r1,right(r1,6) from t2;</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menno, monospace; font-size: 12px"><localhost|mysql>select count(*) from t3 where suffix_r1 = 'sample';<br/>+----------+<br/>| count(*) |<br/>+----------+<br/>|        4 |<br/>+----------+<br/>1 row in set (0.00 sec)</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menno, monospace; font-size: 12px"><localhost|mysql>insert into t4 select id,reverse(r1) from t2;</code>

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menno, monospace; font-size: 12px"><localhost|mysql>select count(*) from t4 where r1 like 'elpmas%';<br/>+----------+<br/>| count(*) |<br/>+----------+<br/>|        4 |<br/>+----------+<br/>1 row in set (0.00 sec)</code>

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL MySQL Index Selectivity

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.