Databases 39 min read

Mastering MySQL: Normal Forms, Storage Engines, Indexes, Transactions, and Optimization

This comprehensive guide covers MySQL fundamentals such as the three normal forms, differences between InnoDB and MyISAM, auto‑increment primary key behavior, index types and design principles, transaction isolation levels, MVCC, logging mechanisms, two‑phase commit, query execution, replication, high‑availability architectures, and practical performance tuning techniques.

JavaEdge

Oct 29, 2021

Mastering MySQL: Normal Forms, Storage Engines, Indexes, Transactions, and Optimization

This article is part of the "Eat Through MySQL" series and serves as a detailed interview‑ready reference for MySQL concepts.

1. Three Normal Forms

First Normal Form (1NF) : Each column holds atomic values with a single responsibility.

Second Normal Form (2NF) : Satisfies 1NF and adds a unique primary key to uniquely identify each row.

Third Normal Form (3NF) : Satisfies 1NF and 2NF, and no non‑key column depends on another non‑key column.

Advantages : Reduces data redundancy, speeds up updates, and minimizes distinct values during queries.

Disadvantages : May increase join operations, reduce index effectiveness, and cause more complex queries.

2. InnoDB vs. MyISAM

2.1 Differences

InnoDB uses clustered indexes; MyISAM uses non‑clustered indexes.

InnoDB stores data and indexes together in .ibd files; MyISAM stores structure in .frm, indexes in .myi, and data in .myd.

InnoDB supports transactions, foreign keys, row‑level locks; MyISAM lacks these and only supports table‑level locks.

InnoDB excels at updates; MyISAM excels at read‑heavy workloads.

Both use B+Tree indexes; MyISAM supports full‑text indexes, InnoDB added support from 5.6 onward.

2.2 MyISAM Characteristics

No transaction support; each query is atomic.

Uses table‑level locks.

Stores total row count.

Consists of three files: .frm, .myi, .myd.

Non‑clustered indexes store pointers to data rows.

2.3 InnoDB Characteristics

Supports ACID transactions and four isolation levels.

Row‑level locking and foreign‑key constraints enable high write concurrency.

Does not store total row count.

Primary key uses clustered index; secondary indexes store the primary key value.

2.4 Usage Scenarios

InnoDB is the default for most workloads. MyISAM may be chosen for read‑intensive scenarios where transactions and crash recovery are not required.

3. Auto‑Increment Primary Keys

In InnoDB, the auto‑increment counter is stored in memory. Starting with MySQL 8.0, the value is persisted in the redo log, allowing recovery after a restart.

MySQL 5.7 and earlier: the counter is lost on restart; the next value is calculated as MAX(id)+1.

MySQL 8.0: the counter change is recorded in the redo log, enabling true persistence.

Insertion rules:

If the id column is omitted, set to NULL, or 0, MySQL uses the current auto‑increment value.

If a specific value is supplied, MySQL stores that value directly.

When the supplied value is greater than or equal to the current counter, the counter becomes value+1; otherwise it remains unchanged.

4. Why Auto‑Increment IDs May Not Be Consecutive

In‑memory storage (pre‑8.0) loses values on restart.

Transaction rollbacks prevent decrementing the counter.

Unique‑key conflicts cause an allocated ID to be skipped.

Example shows an insert that fails due to a duplicate unique key after the auto‑increment value has already advanced, leaving a gap.

5. Benefits of Auto‑Increment IDs

Sequential primary keys improve page fill factor, reduce page splits, and avoid random I/O associated with UUIDs.

6. Index Basics

Indexes are ordered data structures that accelerate lookups. They improve read performance but consume storage and slow write operations.

7. Index Types

Normal index : Allows duplicate values.

Unique index : Enforces uniqueness; can be nullable.

Primary key : Unique, non‑null, clustered in InnoDB.

Full‑text index : Supports text search.

Covering index : Index contains all columns needed by the query, eliminating the need for a table lookup.

Index push‑down : Filters rows during index scan, reducing rows returned to the server.

Composite (multi‑column) index : Follows the left‑most prefix rule.

8. Index Underlying Structures

MySQL uses B+Tree for most indexes; hash indexes are used for exact‑match lookups but do not support range scans.

9. B‑Tree vs. B+Tree

B+Tree stores keys only in internal nodes and full rows in leaf nodes, resulting in higher fan‑out, shallower trees, and faster range queries.

10. Index Design Principles

Index columns used in WHERE or join conditions.

Avoid indexing low‑cardinality tables.

Prefer short indexes; use prefix indexes for long strings.

Index foreign‑key columns.

Do not over‑index; avoid indexes on frequently updated columns.

Use composite indexes when queries filter on multiple columns, placing the most selective columns first.

11. Index Invalidations

Leading wildcard LIKE '%abc' disables index use.

Functions or calculations on indexed columns prevent index usage.

OR conditions without indexes on all operands.

Implicit type conversion without proper quoting.

MySQL may choose a full table scan if it estimates lower cost.

Violating the left‑most prefix rule on composite indexes.

Using NOT, <>, or != on indexed columns.

IS NULL checks can also bypass indexes.

12. Creating Indexes

ALTER TABLE table_name ADD INDEX index_name (column_list);
CREATE INDEX index_name ON table_name (column_list);
-- also create during CREATE TABLE

13. Does a Non‑Clustered Index Always Require a Table Lookup?

If the query can be satisfied entirely by the index (covering index), the engine reads data directly from the index without a back‑table lookup.

14. Composite Index Rules

Place frequently queried or highly selective columns first.

Reuse indexes when possible (e.g., index (a,b) can also serve queries on a alone).

Consider separate indexes for independent queries on each column.

15. Left‑most Prefix Principle

Indexes are used from the leftmost column until a range condition stops the match.

16. Prefix Indexes

For long string columns, create an index on the first N characters, e.g., INDEX(col(10)).

17. Deleting Massive Data

Delete indexes first, then delete data, and finally recreate indexes to speed up bulk deletions.

18. Normal vs. Unique Index Choice

Both have similar read performance; unique indexes add a uniqueness check on insert/update, which may involve extra buffer operations.

19. MySQL Architecture – Query Execution Flow

Connection layer authenticates and checks permissions.

Query cache (if enabled) is consulted.

Parser performs lexical and syntactic analysis.

Optimizer generates an execution plan and selects indexes.

Executor opens the table, invokes the storage engine, scans rows (using indexes if applicable), and returns results.

20. Two‑Phase Commit (2PC)

Engine filters rows and returns them to the executor.

Executor performs updates via the engine interface.

Engine writes changes to the redo log in the PREPARE state.

Executor writes the transaction to the binary log.

Engine commits the redo log, finalizing the transaction.

21‑24. Transactions, Isolation Levels, ACID, and MVCC

Transactions provide atomicity, consistency, isolation, and durability. InnoDB implements:

Undo log for rollback (atomicity).

Redo log for crash‑safe persistence (durability).

MVCC using version chains and a Read View to provide snapshot isolation (repeatable read) and read‑committed isolation.

Locking (row‑level, gap locks) to prevent phantom reads under repeatable‑read.

25. Logging Mechanisms

Undo log : Stores old row versions for rollback and MVCC.

Redo log : Records modifications for crash recovery; flushed according to innodb_flush_log_at_trx_commit (0, 1, 2).

Binary log : Replication log; formats: ROW, STATEMENT, MIXED.

26. EXPLAIN Output

Key fields: type (access method), possible_keys, key, key_len, rows, Extra (e.g., using index, using where, using filesort).

27. Dirty Pages

Dirty pages are memory pages whose contents differ from disk. They are flushed when the redo log fills, memory pressure occurs, the server is idle, or during a graceful shutdown.

28. MySQL Performance Tuning

Enable slow‑query logging and analyze logs.

Use EXPLAIN to identify missing or ineffective indexes.

Prefer covering indexes and index push‑down.

Apply the left‑most prefix rule for composite indexes.

Avoid functions on indexed columns.

For write‑heavy workloads, consider ordinary indexes to leverage the change buffer.

Optimize INSERTs (batch inserts, transactions, ordered inserts).

Normalize large tables or create intermediate tables for frequent joins.

Enable Multi‑Range Read (MRR) to convert random I/O into sequential reads.

Adopt read/write splitting and master‑slave replication for scalability.

29. Master‑Slave Replication

Master writes to the binary log; the log dump thread streams it to slaves. Slaves receive it via an I/O thread into a relay log, then an SQL thread replays the events.

Replication modes:

Full sync: master waits for all slaves to acknowledge.

Semi‑sync: master proceeds after at least one slave acknowledges.

30. High‑Availability Architecture

Typical one‑master‑one‑slave (M‑S) setup with the slave in read‑only mode. Primary concerns are replication lag ( seconds_behind_master) caused by hardware differences, heavy read/write load on the slave, or large transactions.

Mitigation strategies include adding more slaves, parallel replication (coordinator + workers), and tuning slave_parallel_workers.

Failover strategies vary between reliability‑first (wait for zero lag before promoting the slave) and faster but potentially inconsistent switches.

Overall, the article provides a deep dive into MySQL theory and practical tips for interview preparation and real‑world performance tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Tuning InnoDB mysql Database Indexes

Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.