Optimizing MySQL Queries on Tens of Millions of Rows: Indexes, Partitioning & Tips
This article walks through generating tens of millions of rows in MySQL, then demonstrates a series of optimizations—including simple and covering indexes, data reduction, small‑table‑driven joins, and forced indexes—showing how each technique impacts query performance and EXPLAIN output.
Background
MySQL tables with tens of millions of rows are challenging; full scans become slow. The article demonstrates generating 15 million rows using stored procedures and explores optimization techniques.
Data Generation
Provides CREATE TABLE statements for orders and users, and stored procedures create_users() and generate_orders() that insert 1,000 users and 15 million orders via multiple sessions.
Baseline Query and Performance
The initial query sums total_amount per user using a LEFT JOIN without indexes, taking about 191 seconds. EXPLAIN shows type=ALL for both tables, indicating full scans.
First Optimization – Simple Indexes
Creates indexes on orders(user_id), orders(total_amount), and users(user_id). The same query now uses index and ref access but performance degrades to ~460 seconds due to MySQL’s back‑table lookup.
Second Optimization – Covering Indexes
Adds composite indexes idx_orders_total_amount_user_id and idx_orders_user_id_total_amount. Index creation costs ~300 seconds. Query time drops dramatically to ~10 seconds, proving that covering indexes avoid back‑table lookups.
Third Optimization – Reducing Data Volume
Applies a WHERE clause ( a.user_id > 1033) to filter users, reducing rows examined. Query time falls to ~7 seconds, and EXPLAIN shows type=range on the users table.
Fourth Optimization – Small‑Table Driving Large‑Table
Attempts to rewrite the join as a subquery so the smaller filtered set drives the larger table. EXPLAIN changes little and execution time remains similar, indicating limited benefit in this case.
Fifth Optimization – Forced Index
Uses FORCE INDEX (idx_orders_user_id_total_amount) for an IN clause on a few user IDs. Both forced and normal queries run in sub‑second time, showing that when indexes are already used, forcing them has little effect.
Key Takeaways
Create appropriate indexes and prefer covering indexes to eliminate back‑table lookups.
When data approaches tens of millions, consider partitioning or sharding (splitting tables).
Filtering data early (WHERE, range scans) can cut execution time.
Forced indexes can help when the optimizer ignores a useful index.
Explain “type” Field and Back‑Table Mechanism
Describes MySQL EXPLAIN type values (system, const, eq_ref, ref, range, index, all) and their performance implications. Explains the back‑table (bookmark lookup) process when a non‑covering index is used and how it adds I/O.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
