Databases 16 min read

Master MySQL Index Interview Questions: 9 Essential Concepts Explained

This article breaks down nine core MySQL index concepts—including index types, B+Tree advantages, search steps, covering indexes, index invalidation, left‑most prefix rules, index push‑down, and safe index creation on large tables—to help candidates answer interview questions about tree‑search operations and performance trade‑offs.

ITPUB

Feb 1, 2021

Master MySQL Index Interview Questions: 9 Essential Concepts Explained

Introduction

An interview question asked how many B+Tree searches a given SQL statement triggers. The answer requires understanding MySQL index fundamentals.

What Is an Index?

An index is a disk‑resident data structure that maps key values to row locations, similar to a book’s table of contents.

Indexes occupy physical storage; they speed up read queries but add overhead to INSERT/UPDATE operations.

Types of Indexes

Data‑Structure Dimension

B+Tree index : All rows are stored in leaf pages; internal nodes contain only keys. Search complexity O(log n). Ideal for range, ORDER BY, GROUP BY, and DISTINCT queries.

Hash index : Direct key‑value lookup, efficient for equality predicates but cannot serve range scans.

Full‑text index : Supported by MyISAM and InnoDB for CHAR/VARCHAR/TEXT columns; enables natural‑language search.

R‑Tree index : Specialized for GIS spatial data.

Physical Storage Dimension

Clustered (primary) index : Built on the primary key; leaf pages store the complete row data.

Non‑clustered (secondary) index : Built on non‑primary columns; leaf pages store the primary key value together with the indexed columns.

Logical Dimension

Primary key index – unique, NOT NULL.

Ordinary index – allows NULLs and duplicate values.

Composite (combined) index – created on multiple columns; usable according to the left‑most prefix rule.

Unique index – enforces uniqueness but permits NULLs.

Spatial index – available from MySQL 5.7 for geometric types.

Why Choose B+Tree?

Hash indexes cannot satisfy range predicates.

Binary‑tree structures can degenerate to a linked list for skewed data, causing full‑table scans.

Balanced binary trees require rotations on insert/update and have higher tree height, leading to more disk I/O.

B‑Tree reduces height compared with binary trees, but B+Tree further improves I/O by storing only keys in internal nodes, increasing fan‑out.

Non‑leaf B+Tree nodes hold only keys, allowing many more keys per 16 KB InnoDB page and thus fewer page reads. All row data resides in sorted leaf pages, making range scans, ordering, grouping, and distinct operations very efficient.

Single B+Tree Index Search Process

Table definition and sample data:

CREATE TABLE `employee` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `sex` int(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_age` (`age`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO employee VALUES (100,'小伦',43,'2021-01-20',0);
INSERT INTO employee VALUES (200,'俊杰',48,'2021-01-21',0);
INSERT INTO employee VALUES (300,'紫琪',36,'2020-01-21',1);
INSERT INTO employee VALUES (400,'立红',32,'2020-01-21',0);
INSERT INTO employee VALUES (500,'易迅',37,'2020-01-21',1);
INSERT INTO employee VALUES (600,'小军',49,'2021-01-21',0);
INSERT INTO employee VALUES (700,'小燕',28,'2021-01-21',1);

Query: SELECT * FROM employee WHERE age = 32; Search steps (each step may involve a disk‑page read):

Load the root of the secondary index idx_age (disk block 1). Because 32 < 37, follow the left child pointer to disk block 2.

Load disk block 2, locate the leaf entry with age = 32, retrieve the primary‑key value id = 400.

Switch to the clustered primary‑key index on id.

Search the primary‑key B+Tree: load its root (disk block 1), traverse to the leaf containing id = 400, then follow the right sibling to disk block 3.

Load disk block 3 and return the full row.

This sequence involves two index traversals (secondary then primary) and a “back‑table” (回表) lookup after the secondary index.

Covering Index

If the SELECT list contains only columns present in the leaf of the secondary index, e.g. SELECT id, age FROM employee WHERE age = 32; the query can be satisfied directly from idx_age without a back‑table lookup, saving one tree traversal.

Index Invalidations

Certain query patterns prevent MySQL from using an existing index, causing a full‑table scan. Common causes include:

OR conditions that combine indexed and non‑indexed predicates.

Missing quotes around string literals.

Leading wildcard in a LIKE pattern (e.g., LIKE '%text%').

Using a composite index without referencing its leftmost column.

Applying functions or arithmetic on indexed columns.

Operators such as !=, <>, NOT IN, or IS NULL/IS NOT NULL on indexed columns.

Join predicates with mismatched data types.

MySQL optimizer estimating that a full scan is cheaper than using the index.

Composite Index and the Left‑Most Prefix Rule

Given a composite index (name, age), the query

SELECT * FROM employee WHERE name LIKE '小%' ORDER BY age DESC;

uses the index because the leftmost column name satisfies the predicate. The index orders rows first by name, then by age, allowing the engine to retrieve matching rows in the required order without an additional sort.

Index Condition Push‑Down (Index Down)

From MySQL 5.6 onward, the storage engine can evaluate additional conditions while traversing a composite index. For the query

SELECT * FROM employee WHERE name LIKE '小%' AND age = 28 AND sex = 0;

the engine filters rows by age and sex during the index scan, reducing the number of rows that need a back‑table lookup. In MySQL 5.5 the engine would first fetch all primary‑key values matching name LIKE '小%' and then apply the remaining predicates, resulting in more back‑table accesses.

Adding an Index to a Massive Table

Creating an index on a table with tens of millions of rows locks the table and may impact production. A safer procedure is:

Create a new table B with the same schema as the original table A.

Add the desired index(es) to B.

Copy data from A to B (e.g., INSERT … SELECT or a bulk‑load tool).

Rename B to A and rename the original A to a temporary name.

Conclusion

The nine interview points covered are:

Definition and purpose of indexes.

Index categories (data‑structure, physical storage, logical).

Why MySQL uses B+Tree as the default index structure.

Step‑by‑step B+Tree search for a simple equality predicate.

Covering indexes that avoid back‑table lookups.

Typical scenarios that cause index loss.

Composite indexes and the left‑most prefix rule.

Index condition push‑down optimization introduced in MySQL 5.6.

Best‑practice workflow for adding indexes to very large tables.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL MySQL interview index B+Tree

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.