Databases 24 min read

Master MySQL: 3 Normal Forms, Engine Differences, Indexing, Transactions & More

This comprehensive guide covers MySQL fundamentals such as the three normal forms, differences between MyISAM and InnoDB, redo log versus binlog, when queries bypass indexes, join types, various index structures, covering indexes, back‑row lookups, transaction properties and isolation levels, common performance bottlenecks, replication mechanics, lag mitigation, storage behavior after deletes, VARCHAR limits, lock types, systematic SQL tuning steps, the purpose and trade‑offs of indexes, B+‑tree advantages, MVCC internals, and the Snowflake distributed ID algorithm.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Master MySQL: 3 Normal Forms, Engine Differences, Indexing, Transactions & More

1. Three Normal Forms

First Normal Form: fields must be atomic and have a single responsibility.

Second Normal Form: built on the first; each row must be uniquely identifiable, usually by a primary key.

Third Normal Form: built on the first and second; each column must depend directly on the primary key, not on other non‑key columns.

In practice, not every table needs to satisfy all three forms; sometimes denormalizing a few fields can reduce joins and improve query performance.

2. MyISAM vs InnoDB

InnoDB supports transactions; MyISAM does not.

InnoDB supports foreign keys; MyISAM does not.

InnoDB uses clustered indexes (B+Tree) with data stored together; MyISAM uses non‑clustered indexes with separate data files.

InnoDB does not store the exact row count; MyISAM stores it in a variable.

InnoDB has a redo log; MyISAM has none.

InnoDB storage files: .frm and .ibd; MyISAM storage files: .frm, .MYD, .MYI.

InnoDB supports row‑level locking; MyISAM only table‑level locking.

InnoDB requires a primary key; if none is defined it creates a hidden Row_id. MyISAM can work without a primary key.

3. Redo Log and Binlog

MySQL consists of a Server layer (which writes the binary log, or binlog) and an engine layer (where InnoDB writes the redo log).

Redo log is a physical log that records changes to data pages; binlog is a logical log that records the original SQL statements.

Redo log is written in a circular fashion and has a fixed size; binlog is appended and rolls over to a new file when it reaches a configured size.

4. When Queries Skip Indexes (SQL Optimization)

LIKE patterns with a leading % do not use indexes; trailing % can use indexes.

Expressions on indexed columns (e.g., age + 8 = 18) prevent index usage.

Functions on indexed columns (e.g., concat(name,'a')) prevent index usage.

Using !=, NOT IN, OR, etc., on indexed columns disables the index.

NULL columns are not indexed.

Implicit type conversion can prevent index usage.

Composite indexes must follow the left‑most prefix rule.

When only one row is needed, use LIMIT 1.

5. Join Types

LEFT JOIN returns all rows from the left table and matching rows from the right table.

RIGHT JOIN returns all rows from the right table and matching rows from the left table.

INNER JOIN returns only rows that exist in both tables (the intersection).

6. Types of Indexes

Clustered index (primary key), non‑clustered index, and composite index (multiple columns).

7. Covering Index

A covering index satisfies a query using only the index data, eliminating the need to read the full table rows.

Example: SELECT name FROM student WHERE name='Alice' can use a covering index on name without a table lookup.

8. What Is a “Back‑Row” (回表)

Back‑row means the engine first uses the secondary index to locate the primary key, then fetches the full row from the primary index.

9. Transactions and Their Properties

A transaction is a set of operations that must all succeed; if any fail, the whole transaction is rolled back.

Atomicity

Consistency

Isolation

Durability

10. Transaction Isolation Levels

Read Uncommitted – can read uncommitted changes (dirty reads).

Read Committed – can only read committed changes.

Repeatable Read – the same query within a transaction sees the same data.

Serializable – transactions are executed one after another, providing the highest isolation.

11. Reasons a Query Is Consistently Slow

No index used (functions on columns, missing indexes).

Large table size – consider sharding.

Optimizer chooses a suboptimal index – use FORCE INDEX.

12. Reasons a Query Is Occasionally Slow

Dirty page flushing (redo log full, memory pressure).

Lock contention.

13. MySQL Master‑Slave Replication Process

The slave runs an I/O thread to fetch binlog events from the master and write them to a relay log, then a SQL thread reads the relay log and replays the statements.

Typical use cases: high availability (HA) and read/write splitting.

14. Reducing Replication Lag

Enable parallel replication (MySQL 5.6+).

Upgrade hardware.

Design appropriate sharding to avoid oversized tables.

Avoid long‑running transactions.

For latency‑sensitive reads, query the master directly.

15. Why Table Size Doesn’t Shrink After DELETE

DELETE marks rows as reusable (logical delete) in InnoDB; the physical file size remains unchanged.

16. Why VARCHAR Length Should Not Exceed 255

Lengths ≤255 use one byte for the length prefix; longer lengths use two bytes and can affect index efficiency.

17. Types of Locks in MySQL

Lock modes: shared, exclusive.

Granularity: table, row, record, gap, next‑key.

State: intention shared, intention exclusive, deadlock.

18. SQL Tuning Approach

Table structure optimization: split fields, choose appropriate data types, set defaults, add useful redundant fields.

Index optimization: select proper index columns, use index‑condition pushdown, covering indexes, decide between unique and regular indexes.

Query optimization: avoid index loss, order WHERE clauses wisely, drive large tables with small tables, use FORCE INDEX when needed.

Consider sharding.

19. What Is an Index

An index is like a dictionary’s table of contents, allowing the database to locate rows quickly instead of scanning the entire table.

20. Advantages, Disadvantages, and Principles of Indexes

Advantages: faster data retrieval.

Disadvantages: consumes storage and adds overhead on writes.

Principle: create indexes on frequently queried columns (e.g., primary keys).

Do not index columns with low cardinality (many duplicate values).

21. Why B+ Tree Instead of B‑Tree

B+ trees store data only in leaf nodes, making range queries efficient and reducing the height of the tree, which lowers I/O.

22. MySQL MVCC Mechanism

MVCC uses undo logs, roll pointers, and transaction IDs to maintain multiple versions of a row, allowing consistent reads without locking.

ReadView captures the state of active transactions; a query reads the latest version whose transaction ID is visible according to ReadView.

Read Committed generates a new ReadView for each query; Repeatable Read reuses the same ReadView throughout the transaction, preventing non‑repeatable reads and phantom reads.

23. Snowflake ID Generation Algorithm

snowflake

is a 64‑bit distributed ID scheme consisting of:

1 unused bit.

41 bits for millisecond timestamp (≈69 years).

10 bits for data center and machine IDs (32 each).

12 bits for a per‑millisecond sequence (up to 4096 IDs).

Advantages: monotonic increasing IDs, no external dependencies, high performance.

Disadvantages: relies on accurate system clocks; clock rollback can cause duplicate IDs.

IndexingMySQLreplicationtransactions
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.