Comprehensive Guide to MySQL: History, Advantages, Issues, Development Standards, Operations, and Performance Optimization
This article provides an extensive overview of MySQL, covering its history, strengths and limitations, recommended development and operational standards, replication and high‑availability strategies, backup and recovery methods, as well as detailed performance‑tuning techniques for InnoDB and alternative storage engines.
MySQL Overview
MySQL is a widely used open‑source relational database that has attracted increasing attention after Alibaba’s recent de‑IOE initiative.
History
1979 – Monty Widenius created the first version; 1996 – MySQL 1.0 released.
1995‑2000 – MySQL AB founded and introduced the BDB storage engine.
April 2000 – Integration of MyISAM and replication support.
2001 – Heikki Tuuri suggested integrating InnoDB.
2003 – MySQL 5.0 released with views, stored procedures, etc.
2008 – Sun acquired MySQL AB; 2009 – MySQL 5.1 released.
April 2009 – Oracle acquired Sun; December 2010 – MySQL 5.5 released.
February 2013 – MySQL 5.6 GA released; 5.7 was under development.
Advantages
Easy to use.
Open‑source and free.
Good scalability at certain stages.
Active community support.
Performance can meet most internet‑scale storage and throughput requirements when backed by appropriate hardware.
Known Issues
Poor optimizer support for complex SQL.
Limited compliance with the SQL standard.
Immature large‑scale clustering solutions (especially middleware).
Global auto‑increment ID generation challenges.
Asynchronous logical replication can cause data‑safety concerns.
Online DDL limitations.
Incomplete HA solutions.
Backup and recovery procedures are complex and often rely on external tools.
Insufficient diagnostic information for troubleshooting.
Proliferation of forks makes selection difficult.
These drawbacks often outweigh the benefits, making DBA‑level optimization essential. Community contributions (e.g., Google’s semi‑sync patch, Facebook/Twitter forks) continue to improve MySQL.
Database Development Standards
Development standards are a set of recommendations or rules defined by DBAs for internal projects.
Key areas include naming conventions, field design, indexing, and usage guidelines.
Purpose of Standards
Ensure consistent online schema definitions.
Reduce the probability of errors.
Facilitate automation.
Provide a win‑win situation for developers and DBAs when consistently applied.
Without standards, developers may write inefficient full‑table scans or obscure SQL, leading to unstable services and constant firefighting by DBAs.
Basic Naming and Constraint Rules
Use UTF8; for emoji support, use UTF8mb4 (supported since MySQL 5.5.3).
Prefer InnoDB as the storage engine.
Prefer VARCHAR or VARBINARY for variable‑length strings.
Avoid storing images or files directly in the database.
Keep single‑table row count below 100 million.
Never use reserved words for database, table, column, or index names.
Use lowercase letters and underscores for identifiers to make them self‑descriptive.
Keep object names concise while conveying purpose.
Field Guidelines
Define all columns as NOT NULL unless a NULL value is truly required.
Choose the smallest appropriate data type; use UNSIGNED for non‑negative integers.
Store timestamps with TIMESTAMP (or DATETIME after MySQL 5.6.4).
Store IPv4 addresses as UNSIGNED INT instead of CHAR(15) .
Prefer DECIMAL for precise floating‑point values; avoid FLOAT where accuracy matters.
Minimize use of BLOB and TEXT .
Reasons for avoiding NULL :
It wastes storage (InnoDB adds an extra byte per nullable column).
Excessive NULLs can hinder optimizer plan selection.
Regarding DATETIME vs TIMESTAMP , the storage gap has narrowed after MySQL 5.6.4, and TIMESTAMP is limited to the year 2038.
Index Guidelines
Limit the number of columns per index to 5 and total indexes per table to 5; follow the B‑Tree left‑most prefix rule.
Choose high‑cardinality columns for indexing.
Design indexes to cover roughly 80 % of typical queries.
Index columns used in DML , ORDER BY , and GROUP BY .
Avoid implicit type conversion in indexed columns.
Avoid redundant indexes.
Indexes are a double‑edged sword: they accelerate reads but add write overhead and locking.
SQL Coding Guidelines
Avoid stored procedures, triggers, and functions when possible.
Steer clear of large‑table JOIN s; MySQL’s optimizer handles them poorly.
Do not perform heavy calculations inside the database.
Batch multiple INSERT values to reduce round‑trips.
Implement efficient pagination, especially for large offsets.
Do not use LIMIT with UPDATE or DELETE as it may cause master‑slave inconsistency.
Database Operations Standards
Main Operational Contents
SQL and DDL review, especially for online DDL.
High‑risk operation checks; backup before DROP .
Permission control and auditing.
Log analysis (slow‑query and error logs).
High‑availability design.
Backup strategies.
Version Choices
MySQL Community Edition (largest user base).
MySQL Enterprise Edition (commercial).
Percona Server (more features).
MariaDB (less common in China).
Recommended priority: Community > Percona > MariaDB > Enterprise.
Online DDL Issues
Native MySQL DDL locks tables, blocking writes. Large‑scale DDL is painful for DBAs. Solutions include Facebook’s OSC, Percona’s pt‑online‑schema‑change , and custom InnoDB format tweaks.
Facebook OSC provides non‑blocking writes.
pt‑online‑schema‑change offers thorough condition checks and load‑throttling.
Sharding
Only shard when the need is obvious.
Sharding redistributes data based on rules to alleviate write pressure and capacity limits.
Vertical and horizontal sharding are the two main approaches; avoid over‑sharding.
Prefer a middle‑layer to manage sharding logic.
Backup and Recovery
Data safety is paramount; without reliable backups, performance improvements are meaningless.
Full vs. incremental backups.
Hot (online) vs. cold (offline) backups.
Physical vs. logical backups.
Delayed backups and binlog archiving.
Recommended approach:
Hot + physical backup (e.g., xtrabackup ).
For core business, combine delayed backup with logical backup.
Full binlog backup.
Large‑scale implementations show millions of backup operations per year with >99.9 % success.
Performance Optimization
Replication Optimization
Replication is the most common scaling technique but suffers from single‑thread bottlenecks and lag.
Solutions: MySQL 5.6+ multi‑threaded replication, third‑party tools like Tungsten, or sharding.
InnoDB Tuning
InnoDB is a mature ACID‑compliant engine with MVCC, row‑level locking, and high concurrency.
Key parameters: innodb_file_per_table , innodb_buffer_pool_size , innodb_flush_log_at_trx_commit , innodb_log_file_size , innodb_page_size , innodb_flush_method=O_DIRECT , innodb_undo_directory , innodb_buffer_pool_dump , etc.
SSD‑specific tweaks: increase I/O threads, separate redo logs on HDD, undo tablespaces on SSD, enable atomic writes.
Alternative Engines
TokuDB offers ACID, MVCC, fractal‑tree indexes, high compression, and online DDL, making it suitable for write‑intensive workloads, though it lacks mature hot‑backup support and may have stability issues.
Common Pitfalls
Query cache is largely obsolete in high‑concurrency environments.
Large lock granularity (MySQL 5.6 disables it by default).
InnoDB default isolation level is REPEATABLE‑READ; READ‑COMMITTED often yields better concurrency.
SSD Recommendations
Flash storage has been the most impactful performance factor in the past decade. Use multiple SATA SSDs in RAID 5/10 or PCIe SSDs for critical workloads.
Large‑Table Case Study
A table with >6 billion rows (~1.2 TB) was compressed with TokuDB’s LZMA to ~80 GB, enabling online DDL and dramatically reducing recovery time.
Q&A
Q1: Using USE schema; SELECT * FROM table; vs. SELECT * FROM schema.table; has little performance impact, but the former can break replication filters that rely on replicate-wild-ignore-table .
Q2: For SSDs, RAID 5/50/10 is recommended; configure the scheduler for optimal performance.
Q3: Enforce standards through regular developer training and automated schema‑review tools.
Q4: Increase InnoDB hit rate by ensuring sufficient buffer pool size, focusing on hotspot data, and avoiding full‑table scans.
Q5: For CAS operations, read‑write from the master to avoid stale data caused by replication lag.
Q6: National standards are rarely practical for day‑to‑day engineering.
Q7: Typical clusters use one master with multiple slaves; each sub‑cluster serves a specific business domain.
Q8: Track column changes by parsing row‑based binlogs.
Q9: Horizontal sharding can be achieved with middleware such as 360 Atlas.
Q10: Low traffic may not trigger noticeable load balancing.
Q11: ROW binlog format is preferred for reliability; MIXED can hide inconsistencies.
Q12: Implement read/write splitting in application code; if using a proxy, choose a well‑maintained open‑source solution.
Q13: Thread pools benefit short‑lived connections; configure based on workload.
Q14: Recover accidental deletions by restoring from the nearest backup and replaying binlogs up to the desired point.
Q15: Physical hot backup with xtrabackup is recommended; logical backups are useful for single‑table restores.
This article is compiled from Yang Shang‑gang’s experience shared in the QCON High‑Availability Architecture group.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.