Why Auto-Increment Primary Keys Matter and How Index Types Impact MySQL Performance
This article explains why auto‑increment primary keys are preferred, compares B+‑tree and hash indexes, discusses MySQL partitioning, isolation levels, MVCC, and provides practical tips for designing efficient database tables and indexes.
Why Use Auto‑Increment Columns as Primary Keys
If a PRIMARY KEY is defined, InnoDB uses it as the clustered index; otherwise it picks the first unique index without NULLs, or creates an internal 6‑byte ROWID. The data rows are stored in the leaf nodes of the primary index (a B+‑tree), so inserting a new row requires placing it in order within the appropriate page.
With an auto‑increment primary key, new rows are always appended to the end of the current leaf page, and a new page is allocated only when the current one fills. Non‑auto‑increment keys (e.g., ID cards) are effectively random, causing each insert to be placed somewhere in the middle of a page, which leads to page splits, data movement, fragmentation, and the need for OPTIMIZE TABLE to rebuild the index.
Why Indexes Improve Efficiency
Indexes store data in sorted order.
Ordered storage allows direct lookup without scanning.
In the best case, lookup time approaches log₂(N) (binary search).
Differences Between B+‑Tree and Hash Indexes
B+‑tree is a balanced multi‑branch tree where all leaf nodes contain the full keys and are linked in order; internal nodes store only key ranges.
Hash indexes apply a hash function to the key, turning it into a hash value that can be looked up in a single step; they are unordered.
Advantages of Hash Indexes
Excellent for equality queries (provided the key distribution is not highly repetitive).
When Hash Indexes Are Not Suitable
Range queries are unsupported.
Sorting cannot be performed using the index.
Composite indexes cannot use the left‑most prefix rule.
In most cases B+‑tree indexes are sufficient; hash indexes are advantageous for low‑cardinality, equality‑only workloads, such as queries on a HEAP table where the column values are highly unique. select id, name from table where name='李明'; InnoDB automatically builds an adaptive hash index in memory when it detects a pattern that would benefit from it, but this can add lock contention under high concurrency.
Differences Between B‑Tree and B+‑Tree
B‑tree stores keys and data in every node, and leaf nodes have no pointers. B+‑tree stores only keys in internal nodes; all data resides in leaf nodes, which are linked sequentially.
Why B+‑Tree Is Better for Filesystem and Database Indexes
Internal nodes are smaller because they contain only keys, allowing more keys per disk page and reducing I/O.
All leaf nodes are at the same depth, giving uniform search path length and stable query performance.
MySQL Composite Indexes
A composite index on columns (a,b,c) can be used for queries that filter on a, a+b, or a+b+c, but not for b+c alone. The left‑most prefix rule applies.
When Not to Build or to Reduce Indexes
Very small tables.
Tables with frequent INSERT/UPDATE/DELETE operations.
Columns with low cardinality (e.g., boolean flags) where an index provides little benefit.
Columns that are often queried together with a high‑cardinality column.
Table Partitioning
Partitioning splits a logical table into multiple physical partitions based on a rule, improving manageability and query performance when queries can prune partitions.
Types of Partitioning
RANGE – partitions by value ranges.
LIST – partitions by explicit list of values.
HASH – partitions by a hash of one or more columns.
KEY – similar to HASH but uses MySQL’s internal hash function.
Isolation Levels
Serializable – prevents dirty reads, non‑repeatable reads, and phantom reads.
Repeatable Read – prevents dirty reads and non‑repeatable reads.
Read Committed – prevents dirty reads.
Read Uncommitted – allows all anomalies.
MVCC (Multi‑Version Concurrency Control)
InnoDB implements MVCC, allowing reads without locking and reducing read‑write contention in OLTP workloads.
Snapshot read – reads a visible version without acquiring locks.
Current read – reads the latest version and acquires a lock.
Row‑Level Locking
Advantages: minimal lock contention for different rows, small rollback data, ability to lock a single row for a long time.
Disadvantages: higher memory usage, slower than page‑ or table‑level locks for large scans, and can be slower for frequent GROUP BY or full‑table scans.
MySQL Optimization Tips
Enable query cache.
Use EXPLAIN to analyze query plans.
Use LIMIT 1 when only one row is needed.
Index searchable columns.
Prefer ENUM over VARCHAR for fixed‑set fields.
Use prepared statements to improve performance and security.
Consider vertical partitioning.
Choose the appropriate storage engine.
Key vs. Index
A key defines constraints (primary, unique, foreign) and may also serve as an index. An index is solely for accelerating queries and is stored separately in its own tablespace.
MyISAM vs. InnoDB
InnoDB supports transactions and foreign keys; MyISAM does not.
InnoDB uses clustered indexes; MyISAM uses non‑clustered indexes.
InnoDB does not store row count; MyISAM stores it for fast COUNT(*).
MyISAM supports full‑text indexes; InnoDB does not (prior to newer versions).
Best Practices for Table Design
Avoid unrelated fields; use clear, consistent naming conventions.
Do not mix case; use underscores for readability.
Avoid reserved words as column names.
Choose appropriate numeric types and allocate sufficient length for text fields.
Add soft‑delete markers and versioning fields.
Normalize multi‑value fields into separate tables.
Store large text/blobs in separate tables to improve performance.
Prefer VARCHAR over CHAR for variable‑length data.
Define primary keys for all tables and index unique, non‑null columns.
Set default values to avoid NULLs where possible.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
