MySQL Single‑Table Optimization, Indexing, Partitioning, and Sharding Strategies
This article presents comprehensive techniques for improving MySQL performance on large single tables, covering column choices, index design, query writing, engine selection, system parameters, hardware upgrades, read‑write splitting, caching, table partitioning, vertical and horizontal sharding, and guidance on selecting mature sharding solutions.
Single‑Table Optimization
When a MySQL table grows very large, CRUD performance degrades sharply. Before considering table splitting, evaluate the data growth pattern; splitting adds logical, deployment, and operational complexity.
Typical thresholds: integer‑based tables up to 10 000 000 rows are usually fine, while string‑heavy tables up to 5 000 000 rows can still perform well. Even larger tables can often be tuned to handle tens of millions of rows.
Column Design
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED when values are non‑negative.
Allocate only the necessary length for VARCHAR columns.
Replace string columns with enums or integer codes where possible.
Use TIMESTAMP instead of DATETIME for date‑time values.
Keep the total number of columns below 20.
Avoid NULL columns because they hinder index optimization and consume extra index space.
Store IP addresses as integers.
Index Design
Create indexes only on columns used in WHERE or ORDER BY clauses; verify usage with EXPLAIN.
Do not place NULL checks in indexed columns.
Skip indexes on low‑cardinality columns (e.g., gender with only a few distinct values).
For character columns, use prefix indexes and avoid making them primary keys.
Avoid foreign keys and UNIQUE constraints; enforce uniqueness in application code.
When using multi‑column indexes, keep the column order consistent with query predicates and drop unnecessary single‑column indexes.
SQL Query Practices
Enable the slow‑query log to identify expensive statements.
Avoid column operations (e.g., SELECT id WHERE age + 1 = 10); move calculations to the right‑hand side.
Keep statements simple; one SQL per CPU core, split large statements, and avoid long‑running transactions.
Never use SELECT *; specify needed columns.
Replace OR with IN for better index utilization.
Avoid functions, triggers, and pattern matches like %xxx in WHERE clauses.
Minimize JOIN usage; when required, ensure proper indexing.
Compare values of the same type (e.g., string vs string, number vs number).
Prefer BETWEEN over IN for continuous numeric ranges.
Use LIMIT for pagination and keep page sizes reasonable.
Storage Engines
MySQL mainly uses MyISAM and InnoDB:
MyISAM : table‑level locks, no transactions, no foreign keys, no crash recovery, supports BLOB/TEXT prefix indexes, and is suited for read‑intensive workloads.
InnoDB : row‑level locks with MVCC, supports transactions, foreign keys, crash recovery, but does not support full‑text indexes; ideal for write‑intensive workloads.
System Tuning Parameters
Benchmark tools: sysbench, iibench‑mysql, tpcc‑mysql. Important variables include: back_log: increase from default 50 to 500 to allow more pending connections. wait_timeout: reduce idle connection timeout from 8 hours to 30 minutes. max_user_connection: set a reasonable upper bound. thread_concurrency: set to twice the CPU core count. skip_name_resolve: disable DNS lookups for client connections. key_buffer_size (MyISAM): 256–384 MiB on a 4 GiB server; keep key_reads/key_read_requests < 0.1 %. innodb_buffer_pool_size: largest impact on InnoDB; monitor Innodb_buffer_pool_read_requests vs Innodb_buffer_pool_reads. innodb_additional_mem_pool_size, innodb_log_buffer_size (≤ 32 MiB), query_cache_size (usually ≤ 256 MiB), read_buffer_size, sort_buffer_size, read_rnd_buffer_size, record_buffer, thread_cache_size, table_cache.
Hardware Scaling
Scale‑up by increasing CPU, memory, and using SSDs, which benefits both CPU‑bound and I/O‑bound workloads.
Read‑Write Splitting
Commonly used to direct reads to replicas and writes to the primary; avoid dual‑master setups due to added complexity.
Caching Layers
Caching can be applied at multiple layers:
MySQL internal (see system parameters above).
Data‑access layer (e.g., MyBatis cache, Hibernate persistence‑object cache).
Application‑service layer (cache Data Transfer Objects).
Web layer (page caching).
Browser client.
Two main write‑through strategies:
Write‑Through : update cache and DB simultaneously; simple but moderate performance.
Write‑Back : update cache first, flush to DB asynchronously; higher performance but more complex and risk of inconsistency.
Table Partitioning
Introduced in MySQL 5.1 as a transparent horizontal split. A partitioned table appears as a single logical table but consists of multiple physical sub‑tables; indexes are per‑partition, no global index.
Benefits
Allows a single table to store more data.
Facilitates maintenance (e.g., drop or truncate whole partitions).
Queries that target few partitions run faster.
Partitions can be placed on different physical devices.
Helps avoid certain bottlenecks (e.g., InnoDB index mutex, inode lock contention).
Enables backup/restore of individual partitions.
Limitations
Maximum 1024 partitions per table.
Primary key or unique key columns must be included in the partitioning expression.
Partitions cannot have foreign‑key constraints.
NULL values break partition pruning.
All partitions must use the same storage engine.
Partition Types
RANGE : based on continuous numeric intervals.
LIST : based on discrete value sets.
HASH : uses a user‑defined expression to compute a hash.
KEY : similar to HASH but only integer columns and MySQL’s built‑in hash function.
Example – EXPLAIN PARTITIONS
mysql> explain partitions select count(1) from user_partition where id in (1,2,3,4,5);
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | user_partition | p1,p4 | range | PRIMARY | PRIMARY | 8 | NULL | 5 | Using where; Using index |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)Use Cases
Time‑series data – partition by year/month.
Hot‑spot data – isolate frequently accessed rows in a dedicated partition.
Vertical Splitting
Separate tables by column groups (e.g., frequently accessed vs rarely accessed fields) while keeping a common primary key. Benefits: smaller row size, better cache utilization, simpler maintenance. Drawbacks: redundant primary key, increased JOINs, still limited by single‑table size, more complex transaction handling.
Horizontal Sharding
Overview
Distribute rows across multiple databases or tables based on a sharding key. Internal table partitioning is a special case of intra‑database sharding.
Advantages
No single‑node data‑size or concurrency bottleneck.
Minimal changes to application code.
Improved system stability and load capacity.
Disadvantages
Distributed transaction consistency is hard.
Cross‑node JOIN performance degrades.
Operational complexity grows with data expansion.
Sharding Principles
Shard only when necessary; start with single‑table optimization.
Keep shard count low and distribute evenly across nodes.
Select sharding rules (range, enum, consistent hash) that suit growth and query patterns.
Avoid cross‑shard transactions.
Optimize queries to target specific shards; avoid SELECT * and large result sets.
Use data redundancy and partitioning to reduce cross‑shard JOINs.
Solution Landscape
Numerous open‑source and commercial sharding frameworks exist. The table below summarizes key attributes (client vs proxy architecture, supported databases, open‑source status, language, recent activity, GitHub stars, etc.).
Name
Provider
Architecture
DB Support
Sharding
Table Splitting
Read‑Write Split
External Dependency
Open‑Source
Language
Last Update
Stars
MySQL Fabric
MySQL
Proxy
MySQL
Yes
Yes
Yes
None
Yes
Python
4 months ago
35
Cobar
Alibaba
Proxy
MySQL
Yes
No
No
None
Yes
Java
2 years ago
1287
ShardingJDBC
Dangdang
Client
MySQL
Yes
Yes
Yes
None
Yes
Java
Today
1144
MyCat
Community
Proxy
MySQL
Yes
Yes
Yes
None
Yes
Java
2 days ago
1270
Vitess
YouTube
Proxy
MySQL
Yes
Yes
Yes
None
Yes
Go
Today
3636
Choosing a Solution
Decide between client‑side or proxy architecture (client for small/medium, proxy for large/complex).
Verify required features (e.g., ORDER BY support across shards).
Prefer actively maintained projects (avoid those with no updates in the past year).
Prioritize vendors with strong reputation (large companies → community → small firms → individuals).
Consider community feedback and GitHub star count.
Open‑source is preferred for custom extensions.
Recommended choices:
Client architecture: ShardingJDBC .
Proxy architecture: MyCat or Atlas .
MySQL‑Compatible Horizontally Scalable Databases
TiDB
Cubrid
These open‑source solutions still lag behind MySQL in production robustness and require significant operational effort. Cloud offerings such as Alibaba Cloud PetaData, Alibaba Cloud OceanBase, and Tencent Cloud DCDB provide managed alternatives.
NoSQL Alternatives
When ACID guarantees are not required, moving large, write‑heavy, or weakly structured datasets to NoSQL stores can eliminate horizontal scaling challenges. Typical candidates include log, monitoring, analytics data, unstructured or semi‑structured data, and workloads with low transactional requirements.
-END-
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
