Comprehensive Guide to MySQL Single-Table Optimization, Partitioning, Sharding, and Scaling Strategies
This article outlines practical techniques for optimizing large MySQL tables—including column and index tuning, query improvements, engine selection, system parameters, hardware upgrades, caching, partitioning, vertical and horizontal sharding, and discusses client‑side and proxy‑based sharding solutions and compatible databases.
Single Table Optimization
When a MySQL table grows very large, CRUD performance degrades sharply; the following steps can be used to optimize it.
General Guidance
Do not split a table unless data will continuously increase; splitting adds logical, deployment, and operational complexity.
For integer‑based tables, keep row counts below ten million; for string‑heavy tables, stay under five million rows.
Column Recommendations
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED if non‑negative.
Allocate only the necessary length for VARCHAR columns.
Use enums or integers instead of strings.
Prefer TIMESTAMP to DATETIME.
Keep the number of columns under 20.
Avoid NULL columns when possible.
Store IP addresses as integers.
Index Recommendations
Create indexes only on columns used in WHERE or ORDER BY clauses; verify usage with EXPLAIN.
Avoid indexing columns that are frequently checked for NULL.
Do not index low‑cardinality fields (e.g., gender).
Use prefix indexes for character columns and avoid making them primary keys.
Do not use foreign keys; enforce constraints in application code.
Avoid UNIQUE indexes when possible; enforce uniqueness in the application.
When using multi‑column indexes, keep column order consistent with query conditions and drop unnecessary single‑column indexes.
SQL Query Tips
Enable the slow‑query log to identify expensive statements.
Avoid column operations in the WHERE clause (e.g., age + 1 = 10); move calculations to the right side.
Keep statements simple; split large statements to reduce lock time.
Never use SELECT *.
Replace OR with IN (prefer fewer than 200 items).
Implement functions and triggers in application code instead of the database.
Avoid %xxx pattern matching.
Minimize JOIN usage.
Compare values using the same data type.
Avoid != or <> in WHERE clauses.
Use BETWEEN instead of IN for continuous ranges.
Paginate large result sets with LIMIT and keep page size reasonable.
Engine Choices
MyISAM : No row locking, no transactions, no foreign keys, fast for read‑heavy tables, supports full‑text indexing, suitable for tables that are rarely updated.
InnoDB : Row locking with MVCC, supports transactions and foreign keys, crash‑safe, better for insert/update‑heavy workloads.
System Tuning Parameters back_log: Increase from default 50 to 500 to allow more pending connections. wait_timeout: Reduce idle connection timeout from 8 hours to 30 minutes. max_user_connection: Set a reasonable upper limit. thread_concurrency: Set to twice the number of CPU cores. skip_name_resolve: Disable DNS lookups for client connections. key_buffer_size: For MyISAM, increase to 256 MB–384 MB on a 4 GB server; keep key_reads / key_read_requests below 0.1 %. innodb_buffer_pool_size: Largest impact on InnoDB; keep buffer‑pool read‑request ratio high. innodb_additional_mem_pool_size, innodb_log_buffer_size, query_cache_size, read_buffer_size, sort_buffer_size, read_rnd_buffer_size, record_buffer, thread_cache_size, table_cache: Adjust according to workload and memory availability.
Hardware Upgrade
Scale up by adding CPU, memory, or SSDs depending on whether MySQL is CPU‑ or I/O‑bound.
Read‑Write Separation
Use a master‑slave setup (single master for writes, slaves for reads) and avoid multi‑master complexity.
Caching Layers
MySQL internal cache (tuned via system parameters).
Data‑access layer cache (e.g., MyBatis, Hibernate).
Application‑service layer cache (e.g., Spring Cache).
Web‑layer cache.
Browser cache.
Two common service‑layer caching strategies:
Write‑Through : Update cache and database simultaneously; simple but moderate performance.
Write‑Back : Update cache first, asynchronously flush to database; higher performance but more complex.
Table Partitioning
MySQL 5.1 introduced horizontal partitioning, which is transparent to applications.
Partitions are logical tables backed by multiple physical sub‑tables; no global indexes.
Queries must include the partition key to prune partitions.
Example of checking partition usage:
mysql> explain partitions select count(1) from user_partition where id in (1,2,3,4,5);
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | user_partition | p1,p4 | range | PRIMARY | PRIMARY | 8 | NULL | 5 | Using where; Using index |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)Benefits of partitioning:
Allows a single logical table to store more data.
Facilitates bulk deletion, addition of new partitions, and per‑partition maintenance.
Improves query speed when conditions limit the scan to few partitions.
Enables distribution of partitions across different physical devices.
Helps avoid specific bottlenecks such as InnoDB index mutexes.
Supports backup/restore of individual partitions.
Limitations:
Maximum of 1024 partitions per table.
Primary key or unique index columns must be included in the partition key.
No foreign‑key support on partitioned tables. NULL values break partition pruning.
All partitions must use the same storage engine.
Partition types:
RANGE : Based on continuous intervals.
LIST : Based on discrete value sets.
HASH : Based on a user‑defined expression.
KEY : Similar to HASH but uses MySQL’s internal hash function on integer columns.
Vertical Splitting
Separate tables by logical groups (e.g., user data vs. order data) or split a wide table into frequently‑used and rarely‑used columns, keeping a primary‑key relationship.
Advantages: smaller rows, better cache utilization, simpler maintenance.
Disadvantages: redundant primary keys, increased JOIN cost, still limited by single‑table size, more complex transaction handling.
Horizontal Sharding
Distribute rows across multiple tables or databases based on a sharding key.
Pros: eliminates single‑node bottlenecks, minimal application changes, improves stability and load capacity.
Cons: distributed transaction consistency is hard, cross‑node JOIN performance suffers, operational complexity grows.
Sharding Principles
Do not shard unless necessary; start with single‑table optimization.
Keep the number of shards low and distribute them evenly.
Choose sharding rules carefully (range, enum, consistent‑hash) based on growth and access patterns.
Avoid cross‑shard transactions.
Write selective queries; avoid SELECT * and large result sets.
Use data redundancy and partitioning to reduce cross‑shard joins.
When data has a strong time dimension, time‑range sharding is often ideal.
Sharding Solutions
Two main architectures:
Client‑Side Sharding : Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources directly in the application.
Proxy‑Side Sharding : Deploy an independent middleware that abstracts multiple data sources; the application connects to the proxy.
Examples:
Client‑side: ShardingJDBC Proxy‑side: MyCat or Atlas MySQL‑Compatible Horizontally Scalable Databases
TiDB (https://github.com/pingcap/tidb)
Cubrid (http://www.cubrid.org/)
Cloud offerings:
Alibaba Cloud PetaData
Alibaba Cloud OceanBase
Tencent Cloud DCDB
NoSQL Alternatives
For workloads that do not require strict ACID guarantees, consider moving large, write‑heavy tables to NoSQL stores (log data, monitoring, statistics, unstructured data).
--- End of summary ---
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
