Mastering MySQL: History, Pitfalls, Best Practices, and Advanced Ops
This comprehensive guide covers MySQL's evolution, its strengths and limitations, detailed development and operational standards, backup and recovery strategies, replication techniques, performance tuning, and practical Q&A for database administrators seeking to optimize large‑scale MySQL deployments.
Preface
MySQL is widely known, and after Alibaba’s "去IOE" movement, MySQL has attracted increasing attention.
MySQL History
1979: Monty Widenius wrote the first version, released 1.0 in 1996.
1995‑2000: MySQL AB founded, introduced BDB.
April 2000: Integrated MyISAM and replication.
2001: Heikki Tuuri suggested integrating InnoDB.
2003: Released 5.0 with views, stored procedures, etc.
2008: MySQL AB acquired by Sun, 5.1 released in 2009.
April 2009: Oracle acquired Sun; 5.5 released Dec 2010.
Feb 2013: 5.6 GA released, 5.7 in development.
Advantages of MySQL
Easy to use
Open‑source and free
Good scalability at certain stages
Active community
Performance meets most internet storage and latency requirements when supported by proper hardware
These factors explain why many companies choose MySQL, although it also has numerous limitations.
Problems of MySQL
Optimizer struggles with complex SQL
Poor SQL‑standard compliance
Large‑scale clustering solutions are immature, especially middleware
Global auto‑increment ID generation issues
Asynchronous logical replication poses data‑safety concerns
Online DDL support is limited
HA solutions are incomplete
Backup and recovery are complex and rely on external tools
Insufficient diagnostic information for users
Proliferation of forks makes selection difficult
DBAs must address many of these challenges during operations.
Database Development Standards
Development standards are a set of recommendations or rules defined by DBAs for internal projects.
Purpose of Standards
Ensure online schema consistency
Reduce the probability of errors
Facilitate automation
Provide a win‑win for developers and DBAs
Without standards, developers may write full‑table scans or bizarre SQL, leading to instability and constant firefighting for DBAs.
Basic Naming and Constraint Standards
Use UTF8 charset; for emojis use UTF8mb4 (MySQL 5.5.3+)
Storage engine: InnoDB
Prefer VARCHAR/VARBINARY for variable‑length strings
Do not store images or files in the database
Keep single‑table row count below 100 million
Avoid reserved words in database, table, column, and index names
Use lowercase letters and underscores for identifiers, making them self‑descriptive
Keep object names concise
Field Standards
Define all columns as NOT NULL unless NULL is truly needed
Choose the smallest suitable data type; use UNSIGNED for non‑negative integers
Store timestamps with TIMESTAMP type
Use VARCHAR for variable strings (note that the length parameter is character count, not bytes); store IPv4 addresses as UNSIGNED INT instead of CHAR(15)
Prefer DECIMAL for precise floating‑point numbers
Avoid BLOB/TEXT when possible
Why avoid NULL
It wastes storage because InnoDB needs an extra byte per NULL column.
Excessive NULL defaults can mislead the optimizer’s execution‑plan choice.
Since MySQL 5.6.4, the storage difference between DATETIME and TIMESTAMP has become negligible, and TIMESTAMP’s range is limited to 2038.
Index Standards
Limit a single index to no more than five columns; limit total indexes per table to five; follow B‑Tree left‑most prefix rule.
Choose high‑cardinality columns for indexes.
Design indexes to cover ~80% of main queries.
Index columns used in DML, ORDER BY, and GROUP BY.
Avoid implicit type conversion in indexes.
Avoid redundant indexes.
Indexes accelerate reads but add write overhead and lock contention, so excessive indexes should be avoided.
Redundant index example
idx_abc(a,b,c)
idx_a(a) – redundant
idx_ab(a,b) – redundant
Implicit conversion example
Column: remark VARCHAR(50) NOT NULL Query with integer literal causes full‑table scan because the value type does not match the column type.
SQL Class Standards
Avoid stored procedures, triggers, and functions when possible.
Avoid large‑table JOINs; MySQL’s optimizer handles them poorly.
Do not perform heavy calculations inside the database.
Combine multiple VALUES in DML to reduce round‑trips.
Implement efficient pagination, especially for large offsets.
Never use LIMIT with UPDATE/DELETE to prevent master‑slave inconsistency.
Database Operations Standards
Operation Content
SQL, DDL review and timing, especially Online DDL.
High‑risk operation checks; backup before DROP.
Permission control and auditing.
Log analysis (slow‑query and error logs).
High‑availability solutions.
Backup strategies.
Version Selection
MySQL Community Edition – largest user base.
MySQL Enterprise Edition – commercial.
Percona Server – many new features.
MariaDB – less popular domestically.
Recommended priority: Community > Percona > MariaDB > Enterprise.
When using RDS, the community edition is usually the default.
Online DDL Issues
Native MySQL DDL locks tables, blocking writes; large tables make DDL painful. Solutions include Facebook OSC and Percona’s pt‑online‑schema‑change.
Facebook OSC and Percona’s pt‑online‑schema‑change provide non‑blocking schema changes.
No write blockage.
Comprehensive condition checks and load‑delay controls.
Limitations of pt‑online‑schema‑change: longer alteration time, requires a unique or primary key, and limited concurrent modifications per port.
Availability
Seamless master switch using slave start until and a dual‑master setup, handling auto‑increment concerns.
MySQL Cluster Solutions
Organize MySQL instances into clusters.
Most deployments still rely on native MySQL replication.
Native master‑slave sync has performance and safety drawbacks.
MySQL Semi‑Synchronous Replication
Introduced in 5.5, it improves data safety by requiring ACK from at least one replica.
From MySQL 5.7 onward, multi‑replica semi‑sync is configurable.
Master‑Slave Delay Issues
Read‑write splitting often makes the replica busier than the master, causing lag.
Solutions
Identify the bottleneck.
If I/O bound, upgrade hardware (e.g., SSD).
If not I/O/CPU, consider parallel replication.
If still problematic, evaluate sharding.
InnoDB Optimization
InnoDB is a mature ACID‑compliant engine with MVCC, row‑level locking, and O_DIRECT support.
innodb_file_per_table=1
innodb_buffer_pool_size set according to data size and memory
innodb_flush_log_at_trx_commit=0/1/2
innodb_log_file_size larger values improve performance
innodb_page_size configurable
innodb_flush_method=O_DIRECT
innodb_io_capacity tuned for SSD
innodb_buffer_pool_dump and at_shutdown dump
Key InnoDB features: buffer‑pool warm‑up, dynamic resizing (5.7+), custom page size, compression, transportable tablespaces, full‑text, GIS, etc.
InnoDB on SSD
Increase innodb_write_io_threads and innodb_read_io_threads (5.5+).
Raise innodb_io_capacity.
Place redo logs on HDD, undo on SSD (optional).
Enable atomic writes, skip doublewrite buffer.
Leverage InnoDB compression.
Run multiple instances per host.
TokuDB
Transactional, MVCC‑enabled storage engine.
Fractal Tree Index – ideal for write‑heavy workloads.
High compression ratios and native Online DDL.
Supported by major forks; comparable to InnoDB.
Issues: limited official support, enterprise‑only hot‑backup, occasional bugs, and slower recovery for very large tables.
MySQL Optimization Cases
Query cache is largely obsolete due to design flaws.
InnoDB default isolation is REPEATABLE READ; switching to READ COMMITTED can improve concurrency.
Classic deadlock example illustrated.
About SSD
Flash storage has been the biggest performance driver for databases in the past decade; RAID5/10 configurations on SATA or PCIe SSDs are recommended.
Large Table Challenges
Example: a 60 billion‑row (≈1.2 TB) table with many indexes. Historical neglect of sharding caused this size; performance was still acceptable, but future growth demands partitioning.
DBA "Lazy" Experiments
Testing InnoDB limits (e.g., adding a column/index to a 1.2 TB table) is risky; using TokuDB with LZMA compression reduced a 1 TB+ table to ~80 GB, enabling online DDL.
Q&A
Q1: Does using USE schema; SELECT ... differ from SELECT ... FROM schema.table for replication? Impact on replication is minimal, but replication filters require careful configuration (e.g., ReplicateWildIgnoreTable).
Q2: SSD configuration recommendations? Use SATA SSDs in RAID5/50/10; see the accompanying diagram.
Q3: How to enforce database standards? Provide regular training and embed standards into automated review tools.
Q4: How to maximize InnoDB hit rate? Ensure sufficient memory for hot data, avoid full‑table scans, and keep queries cache‑friendly.
Q5: Should CAS operations read from the master? Yes, to avoid stale reads caused by replication lag.
Q6: Is a national standard necessary for DB development? National standards are academic; practical engineering may not benefit directly.
Q7: Can master‑slave clusters be further refined? Typically use one‑master‑multiple‑slaves per sub‑cluster, with monitoring, backup, and HA per sub‑cluster.
Q8: How to track changes to a specific column? Analyze row‑based binlogs with custom tools.
Q9: Advice on sharding large tables with middleware? Experience is limited; 360’s Atlas is a viable open‑source option.
Q10: Why does MySQL proxy not load‑balance under low load? Low traffic may not trigger noticeable load distribution.
Q11: Why recommend ROW binlog format over MIXED? ROW provides stronger data consistency; MIXED can be unpredictable.
Q12: Should read‑write splitting be done in code or via proxy? Implementing in application code offers better decoupling; choose proxy carefully.
Q13: When to use MySQL thread pool and best‑practice settings? Short‑lived connections benefit from thread pools; specific parameters depend on workload.
Q14: Data recovery steps after accidental deletion (including from replica)? Identify deletion time, restore from backup to a new instance, replay binlogs up to the desired point, then apply the recovered data.
Q15: Recommended backup method for large data sets? Physical hot backup with XtraBackup; logical backup is useful for single‑table restores.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
