How to Choose the Best Index Order for MySQL: Boost Query Performance
This article explains how to determine the optimal order of columns in MySQL composite indexes by analyzing B+Tree structure, index selectivity, and real‑world query performance, providing practical examples and SQL demonstrations to help developers create more efficient indexes.
1 Background
When creating composite indexes we often wonder how to order the index columns to achieve the best performance. The correct order depends on the query that uses the index and also on sorting/grouping requirements. Below we discuss this in detail.
2 Index Retrieval Principle
Using InnoDB as an example, InnoDB is MySQL's default storage engine and its data organization is based on a B+Tree. B+Tree is an optimization of B‑Tree that stores only key values in non‑leaf nodes and keeps data records in leaf nodes, allowing more keys per page and reducing tree height, which improves lookup efficiency.
In InnoDB a page is 16KB. Assuming INT primary keys (4 bytes) and pointers (4 bytes), a page can hold roughly 1 K keys, so a depth‑3 B+Tree can index about 10⁹ rows.
Therefore a good index should minimize the number of page reads needed to locate the desired rows.
For more details see “MySQL Comprehensive Index Implementation and Usage”.
3 Analyzing Index Selectivity
3.1 Selectivity Measurement
Assume a table emp with 5 million rows and we want to index empname and depno . Which column has higher selectivity?
Example arrays:
empname: [ali, brand, candy, david, ela, fin, gagn, halande, ivil, jay, kikol]
depno: [dep-a, dep-a, dep-a, dep-a, dep-a, dep-b, dep-b, dep-b, dep-b, dep-b]
Binary search on the empname array finds the target faster because the depno array contains many duplicates, resulting in lower selectivity.
<code>selectivity = count(distinct column_name) / count(*)</code>Higher selectivity yields faster retrieval; low selectivity approaches a full table scan.
3.2 Real‑World Data Comparison
In the emp table, empname is almost unique (selectivity ≈ 0.17) while depno repeats heavily (selectivity ≈ 0.0000).
<code>mysql> select count(distinct empname)/count(*), count(distinct depno)/count(*) from emp;
+----------------------------------+--------------------------------+
| count(distinct empname)/count(*) | count(distinct depno)/count(*) |
+----------------------------------+--------------------------------+
| 0.1713 | 0.0000 |
+----------------------------------+--------------------------------+
1 row in set</code>Creating composite indexes with different column orders shows a dramatic performance difference:
<code>mysql> create index idx_emp_empname_depno on emp(empname,depno);
Query OK, 0 rows affected
mysql> select * from emp where empname='LsHfFJA' and depno='106';
... (0.021 sec)
mysql> create index idx_emp_depno_empname on emp(depno,empname);
Query OK, 0 rows affected
mysql> select * from emp where depno='106' and empname='LsHfFJA';
... (0.393 sec)</code>4 Conclusion
Analyzing index selectivity and cardinality is valuable, but real‑world queries involve additional factors such as ORDER BY, GROUP BY, and range conditions that can significantly affect performance; further analysis is needed.
Architecture & Thinking
🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.