Master MySQL Single-Table Optimization and Scaling Strategies
Learn how to boost MySQL performance for massive tables by optimizing schema design, indexing, query patterns, engine selection, system parameters, hardware upgrades, read/write splitting, caching layers, partitioning, vertical and horizontal sharding, and choosing appropriate client or proxy sharding solutions.
Single Table Optimization
When a MySQL table reaches millions of rows, CRUD performance degrades sharply. Before splitting the table, assess future growth; splitting adds complexity. Tables with fewer than ten million integer‑based rows or five million string‑based rows can usually be optimized without sharding.
Field Design
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED for non‑negative values.
Allocate only the required length for VARCHAR columns.
Replace frequently used string columns with ENUM or integer codes.
Prefer TIMESTAMP to DATETIME for automatic range queries.
Keep the total number of columns below 20 to reduce row size.
Avoid NULL columns; they hinder index usage and waste space.
Store IPv4 addresses as 32‑bit integers.
Index Strategy
Create indexes only on columns used in WHERE or ORDER BY clauses; verify with EXPLAIN.
Do not index columns that are frequently checked for NULL values.
Avoid indexing low‑cardinality columns (e.g., gender).
Use prefix indexes for long CHAR/VARCHAR columns.
Avoid character columns as primary keys.
Prefer application‑level constraints over foreign keys and UNIQUE constraints.
When using composite indexes, order columns to match query predicates and drop redundant single‑column indexes.
SQL Query Tuning
Enable the slow‑query log to locate expensive statements.
Do not perform column calculations in WHERE (e.g., age + 1 = 10); move calculations to the right side.
Keep statements simple; split large statements to reduce lock time.
Never use SELECT *; list only required columns.
Replace OR with IN when possible; keep the IN list under 200 items.
Implement functions and triggers in application code rather than in SQL.
Avoid leading‑wildcard patterns ( %xxx).
Minimize JOIN usage.
Compare values using the same data type.
Avoid != or <> in WHERE clauses as they bypass indexes.
Use BETWEEN for continuous numeric ranges instead of IN.
Paginate results with LIMIT and keep page size reasonable.
Engine Choice
MySQL primarily offers MyISAM (default before 5.1) and InnoDB (default since 5.5).
MyISAM : No row‑level locking, no transactions, no foreign keys; fast reads, supports BLOB/TEXT prefix indexing, delayed index updates, and table compression. Suitable for read‑heavy workloads.
InnoDB : Row‑level locking with MVCC, full transaction support, foreign keys, crash‑safe recovery; better for write‑intensive workloads.
System Tuning Parameters
Benchmark tools such as sysbench , iibench‑mysql , and tpcc‑mysql can measure performance. Key MySQL variables to tune include: back_log – increase from the default 50 to up to 500 to queue more pending connections. wait_timeout – reduce idle connection time (e.g., from 8 hours to 30 minutes). max_user_connections – set a reasonable upper limit. thread_concurrency – set to roughly twice the CPU core count. skip_name_resolve – disable DNS lookups for remote hosts. key_buffer_size – for MyISAM, 256‑384 MB on a 4 GB server; keep key_reads/key_read_requests < 0.1 %. innodb_buffer_pool_size – largest impact on InnoDB; aim for a high hit rate on Innodb_buffer_pool_read_requests. innodb_additional_mem_pool_size – adjust when many metadata objects exist. innodb_log_buffer_size – typically ≤ 32 MB. query_cache_size – tune based on hit rate; 256 MB is a common starting point. read_buffer_size, sort_buffer_size, read_rnd_buffer_size, record_buffer, thread_cache_size, table_cache – adjust according to workload and available memory.
Hardware Upgrade
Scale up by adding faster CPUs, more RAM, or SSD storage, depending on whether MySQL is CPU‑bound or I/O‑bound.
Read/Write Splitting
Deploy a master‑slave topology: writes go to the master, reads to replicas. Avoid multi‑master setups unless required.
Caching Layers
Caching can be applied at multiple levels:
MySQL internal caches (tuned via system variables).
Data‑access layer (e.g., MyBatis second‑level cache, Hibernate entity cache).
Application service layer (caching DTOs).
Web layer (HTTP response caching).
Browser client cache.
Two common write policies:
Write‑Through : Update cache and DB simultaneously; simple but moderate performance.
Write‑Back : Update cache first and flush to DB asynchronously; higher performance but more complex and risk of inconsistency.
Table Partitioning
MySQL 5.1 introduced native partitioning, a transparent horizontal sharding mechanism. A partitioned table is a logical table backed by multiple physical sub‑tables, each with its own indexes (no global index). Queries must include the partitioning column to enable partition pruning.
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | user_partition | p1,p4 | range | PRIMARY | PRIMARY | 8 | NULL | 5 | Using where; Using index |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)Benefits:
Logical table can store more data.
Bulk delete or add whole partitions.
Query speed improves when conditions target few partitions.
Partitions can be placed on different physical devices.
Can alleviate specific bottlenecks (e.g., InnoDB index mutex, inode lock contention).
Limitations:
Maximum 1024 partitions per table.
All primary‑key and unique‑key columns must be part of the partition key.
No foreign‑key support on partitioned tables.
NULL values prevent partition pruning.
All partitions must use the same storage engine.
Partition types: RANGE , LIST , HASH , KEY . RANGE is ideal for time‑series data.
Vertical Splitting
Vertical splitting separates a wide table into multiple tables based on column usage, reducing row size and improving cache utilization.
Advantages:
Smaller row size reduces I/O.
Better cache utilization by grouping rarely‑changed fields.
Simpler data maintenance.
Disadvantages:
Redundant primary keys must be managed.
Introduces JOIN overhead; can be mitigated by application‑side joins.
Does not eliminate the underlying large‑table problem; horizontal sharding may still be needed.
Transaction handling becomes more complex.
Horizontal Sharding
Horizontal sharding distributes rows across multiple tables or databases based on a sharding key. It can be implemented via in‑database partitioning (covered above) or by combining vertical and horizontal splits.
Benefits:
Eliminates single‑node performance bottlenecks.
Minimal changes to application code.
Improves system stability and load capacity.
Drawbacks:
Distributed transaction consistency is hard.
Cross‑node JOINs are slow and complex.
Data expansion and maintenance effort increase.
Sharding Principles
Do not shard unless necessary; start with single‑table optimization.
Keep shard count low and distribute evenly across nodes.
Choose sharding rules (range, list, hash, consistent‑hash) based on growth pattern, access pattern, and future scalability.
Avoid cross‑shard transactions; distributed transactions are difficult.
Optimize queries to avoid SELECT * and large result sets; index frequent queries.
Use data redundancy and partitioning to reduce cross‑shard JOINs.
Sharding Solutions
Client‑Side Sharding
Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources and perform sharding logic within the application, typically packaged as a JAR.
Pros: Direct DB connections reduce external failure risk; low integration cost.
Cons: Limited to data‑access layer; scalability may be insufficient for complex systems; adds load to application servers.
Proxy‑Side Sharding
Deploy an independent middleware that abstracts multiple data sources and handles sharding transparently to the application. Often requires a cluster and coordination services such as Zookeeper.
Pros: Handles complex requirements, strong extensibility, transparent to applications.
Cons: Requires separate deployment and operation; adds a network hop and potential latency.
Tool Recommendations
Client‑side: ShardingJDBC
Proxy‑side: MyCat or Atlas
MySQL‑Compatible Horizontally Scalable Databases
TiDB – https://github.com/pingcap/tidb Cubrid – http://www.cubrid.org These open‑source solutions may require more operational effort than native MySQL.
NoSQL Alternatives
For workloads that do not require strict ACID guarantees or heavy relational joins, consider moving large tables to NoSQL stores (e.g., logs, monitoring data, unstructured data) to achieve true horizontal scalability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
