Databases 13 min read

Boost MySQL Queries Over Millions of Rows: Index Tricks and 30 Optimization Tips

This article explains why plain LIKE searches become slow on large MySQL tables, shows how prefix patterns and string functions like LOCATE, POSITION, INSTR, and FIND_IN_SET can leverage indexes, and provides a comprehensive list of thirty practical SQL optimization techniques to dramatically improve query performance.

dbaplus Community

Jun 5, 2024

Boost MySQL Queries Over Millions of Rows: Index Tricks and 30 Optimization Tips

When using MySQL for fuzzy searches, the common LIKE '%keyword%' pattern triggers a full‑table scan, which is acceptable for small datasets but becomes a serious bottleneck once the table reaches millions of rows.

Using a Prefix Pattern to Enable Indexes

Changing the pattern to LIKE 'keyword%' allows MySQL to use an index on the searched column, dramatically improving speed, as confirmed by EXPLAIN output.

SELECT `column` FROM `table` WHERE `field` LIKE 'keyword%';

Alternative String Functions That Use Indexes

MySQL provides several functions that can replace LIKE while still benefiting from indexes:

LOCATE(substr, str) – returns the position of substr in str. Example:

SELECT LOCATE('xbar', `foobar`);   -- returns 0
SELECT LOCATE('bar', `foobarbar`); -- returns 4
SELECT LOCATE('bar', `foobarbar`, 5); -- returns 7
SELECT `column` FROM `table` WHERE LOCATE('keyword', `field`) > 0;

POSITION('substr' IN `field`) – synonym for LOCATE.

SELECT `column` FROM `table` WHERE POSITION('keyword' IN `field`);

INSTR(str, substr) – another alias.

SELECT `column` FROM `table` WHERE INSTR(`field`, 'keyword') > 0;

FIND_IN_SET(str1, str2) – works when str2 is a comma‑separated list. SELECT FIND_IN_SET('a', 'a,b,c'); -- returns 1 All of the above methods require that the column being searched is indexed; otherwise they also fall back to a full scan.

30 Practical MySQL Optimization Tips

Avoid using != or <> in WHERE clauses; they prevent index usage.

Always create indexes on columns used in WHERE and ORDER BY.

Do not test for NULL in WHERE; instead give the column a default value and query against that. SELECT id FROM t WHERE num = 0; Replace OR conditions with UNION ALL when possible to keep index usage.

SELECT id FROM t WHERE num = 10
UNION ALL
SELECT id FROM t WHERE num = 20;

Do not place a leading % in a LIKE pattern; consider full‑text search for such cases.

Use BETWEEN instead of IN for continuous numeric ranges. SELECT id FROM t WHERE num BETWEEN 1 AND 3; Avoid parameterized values in WHERE that force a full scan; force index usage if needed.

SELECT id FROM t WITH (INDEX(index_name)) WHERE num = @num;

Never apply arithmetic or functions to indexed columns in WHERE; rewrite the expression so the column stands alone.

-- Bad
SELECT id FROM t WHERE num/2 = 100;
-- Good
SELECT id FROM t WHERE num = 200;

Avoid functions on columns (e.g., SUBSTRING, DATEDIFF) in predicates; use range conditions instead.

SELECT id FROM t WHERE name LIKE 'abc%';
SELECT id FROM t WHERE createdate >= '2005-11-30' AND createdate < '2005-12-01';

Never place an expression on the left side of = in a predicate.

When using a composite index, always filter on the leftmost column first.

Avoid creating empty result sets like SELECT col1, col2 INTO #t FROM t WHERE 1=0; create the table directly.

Prefer EXISTS over IN for sub‑queries.

SELECT num FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.num = a.num);

Indexes are ineffective on columns with low cardinality (e.g., gender).

Limit the total number of indexes per table (ideally ≤ 6) to balance read/write performance.

Minimize updates to clustered index columns; they cause costly row re‑ordering.

Store purely numeric data in numeric types, not character types.

Prefer VARCHAR / NVARCHAR over fixed‑length CHAR / NCHAR for space and speed.

Never use SELECT *; list only needed columns.

Use table variables instead of temporary tables when possible; they have limited indexing.

Avoid frequent creation/deletion of temporary tables to reduce system‑table overhead.

Temporary tables are fine for repeated large‑set access, but for one‑off use consider SELECT INTO.

When inserting massive data into a temporary table, use SELECT INTO to avoid excessive logging.

Always drop or truncate temporary tables at the end of stored procedures.

Avoid cursors for large data sets; rewrite set‑based logic instead.

Before using a cursor, first look for a set‑based solution, which is usually more efficient.

Cursors can be acceptable for small data sets; test both approaches to see which performs better.

Set NOCOUNT ON at the start of stored procedures and triggers to reduce network chatter.

Limit the amount of data returned to clients; evaluate whether large result sets are truly needed.

Keep transactions short to improve concurrency and reduce lock contention.

By applying these guidelines—choosing index‑friendly patterns, leveraging string functions, and following the thirty detailed best practices—developers can dramatically reduce query latency on tables with millions of rows and maintain overall database performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL MySQL index

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.