Master SQL Optimization: Index Design, EXPLAIN, and Safe Online DDL for Interview Success
This guide walks you through the fundamentals of MySQL performance tuning—covering the optimization pyramid, how to read EXPLAIN plans, practical index design, safe online DDL for large tables, and interview‑ready case studies that turn SQL expertise into a powerful hiring advantage.
1. Core Points of Database Performance Optimization
Before tweaking a specific SQL statement, you need a global view of where SQL optimization fits in the overall database performance hierarchy. Think of the optimization process as a pyramid:
Hardware layer : upgrade CPU, add memory, replace spinning disks with NVMe SSDs.
System layer : tune OS kernel parameters such as network buffers and file‑handle limits.
Database software layer : adjust DBMS settings (e.g., MySQL innodb_flush_log_at_trx_commit to balance durability and speed).
Application‑layer SQL optimization : the top of the pyramid where developers can directly rewrite queries.
Lower layers affect a broader range of queries but require deeper expertise and higher privileges; the top layer offers the highest return on effort for most developers.
1.1 Understanding EXPLAIN Execution Plans
The EXPLAIN command is the primary diagnostic tool. It returns columns such as type, possible_keys, key, rows, and filtered. The type column ranks access methods from best to worst: system > const > eq_ref > ref > range > index > ALL. ALL indicates a full‑table scan and is the first target for optimization.
-- Example: Analyze a sorting pagination query
EXPLAIN SELECT * FROM tx_user.user ORDER BY uid DESC LIMIT 5;Key columns to focus on: type: access type, with system and const being ideal. possible_keys: candidate indexes the optimizer considered. key: the index actually used (NULL means no index). rows: estimated number of rows examined; smaller is better. filtered: percentage of rows that satisfy the WHERE clause after the scan.
Optimization follows an iterative "analyze‑hypothesize‑validate" cycle, repeatedly using EXPLAIN to verify improvements.
1.2 Index Design
Create indexes on columns frequently used in WHERE predicates.
Index columns involved in ORDER BY to avoid costly filesort.
Index foreign‑key and join columns to prevent nested‑loop join blow‑ups.
Prefer high‑cardinality columns; when building composite indexes, place the most selective column first.
1.3 Risks of DDL on Large Tables and Mitigation Strategies
Running ALTER TABLE on a massive table acquires an exclusive metadata lock (MDL), blocking all reads and writes until the operation finishes. For tables with tens of millions of rows, this can take minutes or hours, which is unacceptable in production.
Three common mitigation approaches:
Planned downtime : pause the service during a maintenance window and run the DDL.
Low‑traffic window : execute the change during off‑peak hours, accepting the risk of overrunning into peak time.
Online DDL via a shadow table ("golden‑silkworm" technique):
Create a new table with the desired schema and indexes.
Copy the full data set from the original table to the shadow table.
Synchronize incremental changes (e.g., via triggers or binlog listeners) while copying.
Atomically rename the original table to a backup name and rename the shadow table to the original name, then switch traffic.
2. Interview‑Ready Scenarios
Catalog project experience : list core business tables, evaluate existing indexes, and run EXPLAIN on the most frequent queries.
Retrospect real cases : describe problem discovery, analysis, solution, and quantitative results (e.g., response time, CPU usage).
Anticipate deep follow‑up questions : prepare answers for why the optimizer chose a certain plan, concurrency implications, and data‑consistency concerns.
2.1 Using Covering Indexes to Eliminate Table Lookups
Problem : a high‑frequency query used SELECT * but only needed columns A, B, C, causing many “Table Access by Index Rowid” operations.
Solution : create a composite index that includes A, B, C and rewrite the query to SELECT A, B, C. The index becomes a covering index, allowing MySQL to satisfy the query entirely from the index leaf nodes. Execution time dropped from hundreds of milliseconds to under 1 ms.
2.2 Leveraging Index Order to Optimize Sorting
Problem : a query with WHERE uid = ? ORDER BY create_time DESC triggered a costly filesort as the user’s data grew.
Solution : create a composite index on (uid, create_time). Because the index stores rows ordered by create_time within each uid, MySQL can read rows in the required order without an extra sort, reducing latency from seconds to tens of milliseconds.
2.3 Optimizing COUNT(*)
In InnoDB, SELECT COUNT(*) forces a full‑table scan. Two strategies:
Approximate count : run EXPLAIN on a similar query and use the rows estimate as an approximation.
Exact count via external store : maintain a counter in Redis (or another fast store) that is updated alongside INSERT and DELETE operations. Ensure consistency with techniques such as binlog‑driven asynchronous updates.
2.4 Cautious Use of Index Hints
When the optimizer picks a sub‑optimal index and rewriting the query cannot help, FORCE INDEX can force the desired index. Use this as a last resort because it reduces maintainability—future data distribution changes may render the hint harmful.
2.5 Correct Division of WHERE and HAVING
Filters on raw table columns belong in WHERE, which executes before GROUP BY. Moving such conditions from HAVING to WHERE can cut the processed row set dramatically; an example showed a 40 % performance gain.
2.6 Breaking the Deep‑Pagination Performance Trap
Offset‑based pagination ( LIMIT offset, count) scans offset + count rows, making large offsets prohibitively slow. Replace it with keyset pagination: use the last row’s primary key (or a unique ordered column) as a cursor, e.g., WHERE id > last_id ORDER BY id ASC LIMIT page_size. This keeps the offset at zero and maintains millisecond‑level latency even on deep pages.
3. Summary
SQL optimization revolves around three pillars: mastering EXPLAIN to diagnose bottlenecks, designing high‑selectivity covering indexes, and applying safe online DDL techniques for large tables. Armed with concrete interview case studies, you can demonstrate deep database expertise and turn performance improvements into compelling interview narratives.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
