Databases 14 min read

Optimizing MySQL Queries on Tens of Millions of Rows: Indexes, Partitioning & Tips

This article walks through generating tens of millions of rows in MySQL, then demonstrates a series of optimizations—including simple and covering indexes, data reduction, small‑table‑driven joins, and forced indexes—showing how each technique impacts query performance and EXPLAIN output.

dbaplus Community

Jun 13, 2024

Optimizing MySQL Queries on Tens of Millions of Rows: Indexes, Partitioning & Tips

Background

MySQL tables with tens of millions of rows are challenging; full scans become slow. The article demonstrates generating 15 million rows using stored procedures and explores optimization techniques.

Data Generation

Provides CREATE TABLE statements for orders and users, and stored procedures create_users() and generate_orders() that insert 1,000 users and 15 million orders via multiple sessions.

Baseline Query and Performance

The initial query sums total_amount per user using a LEFT JOIN without indexes, taking about 191 seconds. EXPLAIN shows type=ALL for both tables, indicating full scans.

First Optimization – Simple Indexes

Creates indexes on orders(user_id), orders(total_amount), and users(user_id). The same query now uses index and ref access but performance degrades to ~460 seconds due to MySQL’s back‑table lookup.

Second Optimization – Covering Indexes

Adds composite indexes idx_orders_total_amount_user_id and idx_orders_user_id_total_amount. Index creation costs ~300 seconds. Query time drops dramatically to ~10 seconds, proving that covering indexes avoid back‑table lookups.

Third Optimization – Reducing Data Volume

Applies a WHERE clause ( a.user_id > 1033) to filter users, reducing rows examined. Query time falls to ~7 seconds, and EXPLAIN shows type=range on the users table.

Fourth Optimization – Small‑Table Driving Large‑Table

Attempts to rewrite the join as a subquery so the smaller filtered set drives the larger table. EXPLAIN changes little and execution time remains similar, indicating limited benefit in this case.

Fifth Optimization – Forced Index

Uses FORCE INDEX (idx_orders_user_id_total_amount) for an IN clause on a few user IDs. Both forced and normal queries run in sub‑second time, showing that when indexes are already used, forcing them has little effect.

Key Takeaways

Create appropriate indexes and prefer covering indexes to eliminate back‑table lookups.

When data approaches tens of millions, consider partitioning or sharding (splitting tables).

Filtering data early (WHERE, range scans) can cut execution time.

Forced indexes can help when the optimizer ignores a useful index.

Explain “type” Field and Back‑Table Mechanism

Describes MySQL EXPLAIN type values (system, const, eq_ref, ref, range, index, all) and their performance implications. Explains the back‑table (bookmark lookup) process when a non‑covering index is used and how it adds I/O.

indexing MySQL EXPLAIN Database Tuning SQL performance large datasets

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.