When Does MySQL IN Use an Index? Scenarios Where It Works and Fails
The article experimentally shows that MySQL's IN clause may use a range index, switch to a full‑table scan, or stay constant depending on the number of values, table size, and the range_optimizer_max_mem_size setting, providing concrete thresholds and configuration guidance.
Many developers still believe that the IN clause never uses an index, a myth from the pre‑5.5 era. In MySQL 5.5+ the optimizer can use an index, but it is not guaranteed.
The author searched for "in/or index" and found mixed advice: IN can use an index, yet a large number of values may cause the optimizer to abandon the index. The question is how many values constitute "many".
To reproduce the behavior, a test table is created:
CREATE TABLE `t_person` (
`id` int(11) NOT NULL,
`name` varchar(10) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;Running EXPLAIN SELECT id, NAME FROM t_person WHERE id IN (1) on an empty table yields "no matching row in const table".
After inserting a single row, the same query shows type=const, meaning the index is used and is the most efficient plan.
Expanding the IN list to two values ( (1,2)) changes the plan to type=range, a range index scan.
Adding a third value ( (1,2,3)) makes the optimizer choose type=ALL, i.e., a full‑table scan, indicating index loss.
After inserting two more rows (total three rows) and re‑running the same query, the plan reverts to type=range, showing that the optimizer can switch back to using the index as data grows.
Further experiments insert one million rows and test with 900 and 1,100 values. Both cases still use a range index, as shown by the execution plans.
When the IN list is increased to 100,000 values, the optimizer finally falls back to a full‑table scan.
Searching the MySQL documentation reveals the system variable range_optimizer_max_mem_size, which limits the memory the optimizer may use for range access methods. A value of 0 means no limit; a positive value causes the optimizer to abandon the range method and consider alternatives, including a full‑table scan, once the estimated memory exceeds the limit.
Default value is 8 MiB.
StackOverflow answers point to the same variable. Experiments with range_optimizer_max_mem_size=8M and range_optimizer_max_mem_size=8 (bytes) using a fixed 19,900‑value IN list show that the larger limit keeps the range index, while the tiny limit forces an ALL scan.
Conclusion : The IN clause triggers a full‑table scan in two situations:
The total size of the IN list (memory consumption) exceeds range_optimizer_max_mem_size.
The number of values in the IN list approaches or equals the number of rows in the table, making a full scan cheaper.
The same reasoning applies to OR, comparison operators ( >, >=, <, <=) and BETWEEN … AND, because they are all range queries. In practice, keeping the IN list as short as possible yields the best performance, especially on large tables.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shepherd Advanced Notes
Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
