Master MySQL Performance: From Single‑Table Tuning to Horizontal Sharding
This comprehensive guide covers MySQL single‑table optimization, field and index design, query best practices, engine choices, system parameters, hardware scaling, read/write splitting, caching strategies, partitioning, vertical and horizontal sharding, client vs. proxy architectures, and alternative databases, providing actionable steps for high‑performance data handling.
Single‑Table Optimization
When a MySQL table grows beyond a few million rows, CRUD performance degrades sharply. Avoid premature sharding unless the data volume is expected to keep increasing; tables with integer primary keys under ten million rows and string‑heavy tables under five million rows usually perform adequately.
Field Design
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED for non‑negative values.
Allocate only the required length for VARCHAR columns.
Replace enumerated strings with ENUM or integer codes.
Use TIMESTAMP instead of DATETIME when possible.
Keep the column count below 20.
Avoid NULL columns – they hinder index usage and waste space.
Store IPv4 addresses as 32‑bit integers.
Index Strategies
Create indexes only for columns used in WHERE or ORDER BY clauses.
Validate index usage with EXPLAIN; eliminate full‑table scans.
Never test NULL in WHERE – it forces a table scan.
Skip low‑cardinality columns (e.g., gender) from indexing.
Use prefix indexes for long CHAR/VARCHAR columns.
Avoid character columns as primary keys.
Prefer application‑level constraints over foreign keys and UNIQUE constraints.
When using composite indexes, order columns to match query predicates and drop redundant single‑column indexes.
SQL Query Tips
Enable the slow‑query log to locate expensive statements.
Avoid column calculations (e.g., SELECT id FROM t WHERE age+1=10) – they disable index usage.
Keep statements simple; one CPU operation per statement. Split large statements to reduce lock time.
Never use SELECT *; list only needed columns.
Replace OR with IN; keep the IN list under 200 items.
Implement functions and triggers in the application layer.
Avoid leading‑wildcard patterns ( %xxx).
Minimize JOIN usage; when required, ensure join columns are indexed.
Compare values using identical data types.
Do not use != or <> in WHERE – they prevent index usage.
Prefer BETWEEN for continuous numeric ranges.
Use LIMIT for pagination and keep page size reasonable.
Engine Comparison
MyISAM
Table‑level read/write locks; no row‑level locking.
No transaction, foreign‑key, or crash‑safe recovery support.
Allows concurrent inserts while reading.
Supports prefix indexing for BLOB/TEXT and full‑text indexes.
Delayed index updates improve write throughput.
Compressible tables reduce disk usage for read‑only data.
InnoDB
Row‑level locking with MVCC for high concurrency.
Full transaction and foreign‑key support.
Crash‑safe recovery.
Full‑text indexes available from MySQL 5.6.4.
Overall, MyISAM suits SELECT ‑heavy workloads; InnoDB excels for INSERT / UPDATE ‑intensive tables.
System Tuning Parameters
Benchmark tools such as sysbench , iibench‑mysql and tpcc‑mysql can be used to evaluate settings. Key variables:
backlog : increase from 50 to 500 to allow more pending connections.
wait_timeout : reduce idle connection timeout from 8 h to 30 min.
max_user_connections : set an upper bound for concurrent connections.
thread_concurrency : set to 2 × CPU core count.
skip_name_resolve : disable DNS lookup for external connections.
key_buffer_size (MyISAM): 256–384 MiB on a 4 GiB server; monitor SHOW STATUS LIKE 'Key_read%' and keep Key_reads/Key_read_requests < 0.1 %.
innodb_buffer_pool_size : largest impact for InnoDB; monitor SHOW STATUS LIKE 'Innodb_buffer_pool_read%' and keep the hit ratio as high as possible.
innodb_additional_mem_pool_size : increase when many schema objects exist.
innodb_log_buffer_size : keep ≤ 32 MiB.
query_cache_size : adjust based on hit rate (Qcache_hits/(Qcache_hits+Qcache_inserts))*100; typical value ≈ 256 MiB.
read_buffer_size , sort_buffer_size , read_rnd_buffer_size , record_buffer : raise for sequential scans, large sorts, or random reads, but watch per‑connection memory usage.
thread_cache_size : keep idle threads ready.
table_cache : caches table file descriptors (mainly benefits MyISAM).
Hardware Scaling
Scale‑up by adding CPU cores, memory, and SSDs. Identify whether MySQL is CPU‑bound or I/O‑bound to prioritize upgrades.
Read/Write Splitting
Typical pattern: write to a master, read from replicas. Avoid dual‑master setups due to added complexity; first exhaust single‑table optimizations and caching.
Caching Layers
MySQL internal : configuration covered in system‑tuning parameters.
Data‑access layer : MyBatis or Hibernate second‑level caches.
Application service layer : cache DTOs for fine‑grained control.
Web layer : page‑level caching.
Browser client : client‑side cache.
Table Partitioning
Introduced in MySQL 5.1, partitioning splits a logical table into multiple physical sub‑tables without code changes. Indexes are per‑partition; there is no global index.
SQL must contain the partition key to prune partitions. Use EXPLAIN PARTITIONS to see which partitions a query touches.
Benefits
Store more data in a single logical table.
Maintenance is easier – drop or add whole partitions, repair individually.
Queries that target specific partitions run faster.
Partitions can be placed on different physical devices.
Mitigates bottlenecks such as InnoDB index mutexes or ext3 inode lock contention.
Backup/restore can be performed per partition.
Limitations
Maximum 1024 partitions per table.
Primary‑key or unique‑key columns must be part of the partition key.
No foreign‑key support on partitioned tables. NULL values disable partition pruning.
All partitions must use the same storage engine.
Partition Types
RANGE : based on continuous intervals.
LIST : based on discrete value sets.
HASH : uses a user‑defined expression that returns a non‑negative integer.
KEY : similar to HASH but uses MySQL’s internal hash on integer columns.
Typical Scenarios
Time‑series data benefits from RANGE partitioning by date. Hot‑spot data can be isolated in its own partition to keep it in memory. Historical data can be archived by dropping old partitions.
Vertical Splitting
Vertical sharding separates tables by logical relevance (e.g., user vs. order data) or splits a wide table into a frequently accessed subset and a rarely accessed subset, each with its own primary key.
Advantages
Smaller rows reduce I/O per block.
Better cache utilization by grouping stable columns.
Simpler data maintenance.
Disadvantages
Redundant primary keys must be managed.
Additional JOIN operations increase CPU load.
Does not eliminate the large‑single‑table problem; horizontal sharding may still be required.
Transaction handling becomes more complex.
Horizontal Splitting (Sharding)
Overview
Horizontal sharding distributes rows across multiple tables or databases based on a sharding key, achieving true distributed storage. Table partitioning is a special case of intra‑database sharding.
Pros
No single‑node data or concurrency bottleneck.
Minimal changes to the application layer.
Improved system stability and load capacity.
Cons
Distributed transaction consistency is hard.
Cross‑node JOIN performance suffers and adds complexity.
Operational overhead grows with the number of shards.
Sharding Principles
Shard only when necessary; start with single‑table optimization.
Keep shard count low and distribute evenly across nodes.
Choose sharding rules based on data growth and access patterns (range, enum, consistent hash).
Avoid cross‑shard transactions; design for single‑shard operations.
Never use SELECT * or return massive result sets; index frequent queries.
Reduce cross‑database JOIN by data duplication or partitioning.
For time‑based data (e.g., orders), use short‑range active partitions and longer‑range historical partitions.
Sharding Architectures
Client‑Side Sharding
Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources directly in the application. Typically packaged as a JAR.
Pros: direct DB connection reduces external failure points; low integration cost; no extra middleware.
Cons: limited to the data‑access layer; less extensible for complex scenarios; adds load to application servers.
Proxy‑Side Sharding
Deploy an independent middleware that abstracts multiple data sources and performs sharding logic transparently to the application.
Pros: handles complex requirements; strong extensibility; transparent to applications.
Cons: requires separate deployment and operation; adds an extra network hop and potential latency.
Solution Comparison
Choosing between client‑side (e.g., ShardingJDBC) and proxy‑side (e.g., MyCat, Atlas) depends on scale, complexity, and operational resources.
MySQL‑Compatible Horizontally Scalable Databases
https://github.com/pingcap/tidb http://www.cubrid.org/
These open‑source projects lack the industrial polish of MySQL and require more operational effort. Managed cloud offerings such as Alibaba Cloud PolarDB, OceanBase, and Tencent Cloud TDSQL provide horizontally scalable MySQL‑compatible services.
NoSQL Alternatives
When ACID guarantees are not required, consider moving large, weakly structured tables to NoSQL solutions, for example:
Log, monitoring, and statistical data.
Unstructured or semi‑structured data.
Data with low transactional requirements and few joins.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
