Mastering MySQL: Keys, Indexes, Transactions, Storage Engines, and Optimization Explained
This comprehensive guide covers MySQL fundamentals—including primary, foreign, candidate, and super keys—auto‑increment primary keys, triggers, stored procedures, views, cursor usage, index types and design, transaction properties and isolation levels, storage engine differences, query execution order, EXPLAIN analysis, lock mechanisms, replication strategies, high‑concurrency solutions, and crash‑recovery via REDO and UNDO logs, providing practical examples and code snippets for each concept.
Basic Concepts
Primary, Foreign, Super, and Candidate Keys
Super key: Any attribute set that uniquely identifies a tuple in a relation; includes candidate and primary keys.
Candidate key: Minimal super key without redundant attributes.
Primary key: A column or column combination that uniquely and completely identifies each row; cannot be NULL.
Foreign key: A column that references the primary key of another table.
Why Use Auto‑Increment Columns as Primary Keys
When a PRIMARY KEY is defined, InnoDB uses it as the clustered index. If no explicit primary key exists, InnoDB selects the first unique index without NULL values; otherwise it creates a hidden 6‑byte ROWID.
Data rows are stored in the leaf nodes of a B+Tree ordered by the primary key. Inserting a new row with an auto‑increment key appends it to the end of the current index page, minimizing page splits and fragmentation. Non‑auto‑increment keys cause random inserts, leading to page splits, data movement, and the need for OPTIMIZE TABLE to rebuild pages.
Triggers
Triggers are special stored procedures executed automatically by events. They enforce constraints, maintain data integrity, track operations, and can cascade actions across tables.
Stored Procedures
A stored procedure is a pre‑compiled set of SQL statements that can be invoked multiple times, improving modularity and performance.
Calling methods:
Use a command object to execute the procedure.
Invoke from external programs such as Java.
Procedure Advantages and Disadvantages
Advantages:
Pre‑compiled for high execution efficiency.
Reduces network traffic by executing on the server.
Provides security through permission control.
Reusable, reducing development effort.
Disadvantages:
Poor portability across different DBMS.
Views and Cursors
View: A virtual table that presents data from one or more base tables; updates to a view affect the underlying tables.
Cursor: Allows row‑by‑row processing of a result set, useful when set‑based operations are insufficient.
Indexes
What Is an Index?
An index is a sorted data structure, typically a B‑Tree or B+Tree, that speeds up data retrieval by providing a fast lookup path.
Benefits and Drawbacks
Benefits: Faster queries, unique constraints, quicker joins, efficient grouping and sorting, and optimizer assistance.
Drawbacks: Additional storage space, slower inserts/updates/deletes, and maintenance overhead.
When to Create Indexes
Columns frequently used in search conditions.
Primary key columns.
Foreign key columns used in joins.
Columns used in range queries, sorting, or GROUP BY.
Columns appearing often in WHERE clauses.
Do not index: Rarely used columns, low‑cardinality columns (e.g., gender), TEXT/IMAGE/BIT columns, or when write performance outweighs read performance.
Index Types
B‑Tree vs. B+Tree: B+Tree stores all keys in leaf nodes and links leaves for sequential access, making range scans more efficient.
Clustered vs. Non‑Clustered Indexes: Clustered indexes store rows in primary key order; non‑clustered indexes store pointers to rows.
Hash vs. B+Tree (InnoDB): Hash indexes provide O(1) equality lookups but cannot support range queries, ordering, or prefix scans. B+Tree indexes support all these operations and are generally preferred.
Transactions
Definition and ACID Properties
A transaction groups multiple operations into a single unit that can be committed or rolled back, ensuring Atomicity, Consistency, Isolation, and Durability.
Isolation Levels and Concurrency Issues
Read Uncommitted: Allows dirty reads.
Read Committed: Prevents dirty reads but allows non‑repeatable reads.
Repeatable Read (MySQL default): Prevents non‑repeatable reads; may still have phantom reads.
Serializable: Highest isolation, eliminates all concurrency anomalies.
Common concurrency problems include dirty reads, non‑repeatable reads, and phantom reads.
Transaction Propagation Behaviors (Spring‑style)
PROPAGATION_REQUIRED
PROPAGATION_SUPPORTS
PROPAGATION_MANDATORY
PROPAGATION_REQUIRES_NEW
PROPAGATION_NOT_SUPPORTED
PROPAGATION_NEVER
PROPAGATION_NESTED
Nested Transactions
Nested transactions use savepoints; a child rollback restores to the savepoint without affecting the outer transaction, while an outer rollback aborts the entire transaction.
Storage Engines
InnoDB, MyISAM, and MEMORY
InnoDB: Supports transactions, row‑level locking, foreign keys, and crash recovery; default engine since MySQL 5.5.
MyISAM: Table‑level locking, no transaction support, faster for read‑heavy workloads, supports full‑text indexes.
MEMORY: Stores data in RAM for ultra‑fast access; data is lost on restart; uses hash indexes by default.
Choosing Between InnoDB and MyISAM
Use InnoDB for write‑intensive, high‑integrity applications; use MyISAM for read‑heavy, simple queries where transaction support is unnecessary.
SQL Optimization
Query Execution Order
The logical order is FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY, which differs from the written order.
Using EXPLAIN
EXPLAIN provides details such as table, access type (type), possible keys, used key, key length, rows examined, and extra information (e.g., using index, filesort, temporary tables).
Slow Query Diagnosis
Enable slow_query_log, set slow_query_log_file, and define long_query_time to capture inefficient statements.
Locking Mechanisms
Lock Granularity
Table‑level lock: low overhead, no deadlocks, but poor concurrency.
Row‑level lock: higher overhead, possible deadlocks, best concurrency.
Page‑level lock: intermediate overhead and concurrency.
Deadlock Detection and Resolution
Deadlocks occur when sessions acquire locks in different orders. Resolve by killing one victim session (e.g.,
SELECT trx_mysql_thread_id FROM information_schema.innodb_trx;) or by setting a lock timeout ( innodb_lock_wait_timeout).
Pessimistic vs. Optimistic Locks
Pessimistic lock: Use SELECT ... FOR UPDATE within a transaction to acquire row locks before updating.
Optimistic lock: Add a version or timestamp column; update only if the version matches, otherwise retry.
Replication and High Availability
Master‑Slave Replication Modes
Asynchronous: Master returns immediately after writing.
Semi‑synchronous: Master waits for at least one slave to acknowledge receipt.
Synchronous: Master waits for all slaves (rarely used).
Read‑Write Splitting
Writes go to the master; reads are distributed among slaves using a proxy or load balancer (e.g., HAProxy).
Scaling Strategies
Vertical scaling: upgrade hardware.
Horizontal scaling: sharding (vertical partitioning) and table partitioning (horizontal).
Use caching layers (e.g., Memcached) to reduce database load.
Crash Recovery
UNDO Log
Records before‑image of modified rows to enable rollback and supports MVCC. Written before data changes are persisted.
REDO Log
Records after‑image of changes; persisted before transaction commit, allowing recovery of committed data after a crash.
References
Images and diagrams referenced in the original article are retained for illustration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
