MySQL Single‑Table Optimization, Partitioning, Sharding and Scaling Strategies
When a MySQL table grows large, CRUD performance degrades, so this article presents comprehensive optimization techniques—including field design, indexing, query rewriting, engine selection, system parameters, hardware upgrades, read‑write separation, caching, partitioning, vertical and horizontal splitting, sharding architectures, and recommendations for compatible scalable databases—to help maintain high performance and scalability.
When MySQL single‑table record counts become large, CRUD performance sharply declines; the article outlines steps to optimize.
Single Table Optimization
Do not split a table unless data will continuously grow; splitting adds logical, deployment and operational complexity. Integer‑based tables are generally fine up to 10 million rows, while string‑based tables are acceptable up to 5 million rows, though many tables can still perform well beyond these thresholds.
Fields
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED when values are non‑negative.
Allocate only the necessary length for VARCHAR.
Use enums or integers instead of strings.
Prefer TIMESTAMP rather than DATETIME.
Keep the number of columns under 20.
Avoid NULL columns as they hinder index optimization and consume extra space.
Store IP addresses as integers.
Indexes
Create indexes only on columns used in WHERE or ORDER BY; verify usage with EXPLAIN.
Do not test NULL values in WHERE clauses, as this forces full table scans.
Do not index low‑cardinality columns (e.g., gender).
Use prefix indexes for character columns and avoid making them primary keys.
Avoid foreign keys; enforce constraints in application code.
Minimize use of UNIQUE constraints; enforce uniqueness in code.
When using composite indexes, keep column order consistent with query conditions and drop unnecessary single‑column indexes.
SQL Queries
Enable slow‑query log to identify expensive statements.
Avoid column operations (e.g., SELECT id WHERE age + 1 = 10) because they cause table scans.
Keep SQL simple; one CPU core per statement, split large statements, and avoid SELECT *.
Replace OR with IN for better performance (log‑scale vs linear).
Do not use functions or triggers; implement logic in the application.
Avoid %xxx pattern matching.
Minimize JOIN usage.
Compare values of the same type (e.g., string vs string, number vs number).
Do not use != or <> in WHERE clauses; they prevent index usage.
For continuous ranges, use BETWEEN instead of IN.
Paginate results with LIMIT and keep page size reasonable.
Engine
Two widely used storage engines are MyISAM and InnoDB.
MyISAM
No row locking; table‑level read lock and exclusive write lock.
No transaction support, foreign keys, or crash‑safe recovery.
Supports BLOB/TEXT prefix indexes, full‑text indexes, delayed index updates, and table compression.
InnoDB
Row locking with MVCC for high concurrency.
Supports transactions, foreign keys, and crash‑safe recovery.
No full‑text index support.
Overall, MyISAM suits SELECT ‑heavy workloads, while InnoDB is better for INSERT and UPDATE intensive tables.
System Tuning Parameters
Benchmark tools: sysbench, iibench‑mysql, tpcc‑mysql. back_log: increase from default 50 to 500 to allow more pending connections. wait_timeout: reduce idle connection time from 8 hours to 30 minutes. max_user_connection: set a reasonable upper limit. thread_concurrency: set to twice the CPU core count. skip_name_resolve: disable DNS lookup for client connections. key_buffer_size: enlarge for MyISAM index caching (e.g., 256 MB‑384 MB on a 4 GB server) and keep key_reads / key_read_requests below 0.1 %. innodb_buffer_pool_size: increase to improve InnoDB data and index caching; aim for a high
(Innodb_buffer_pool_read_requests – Innodb_buffer_pool_reads) / Innodb_buffer_pool_read_requestsratio. innodb_additional_mem_pool_size, innodb_log_buffer_size, query_cache_size, read_buffer_size, sort_buffer_size, read_rnd_buffer_size, record_buffer, thread_cache_size, table_cache: adjust according to workload and monitor related status variables.
Hardware Upgrade
Scale‑up by adding CPU, memory, or SSDs, depending on whether MySQL is CPU‑ or I/O‑bound.
Read‑Write Separation
Use a master for writes and a slave for reads; avoid dual‑master setups to keep complexity low.
Cache
Caching can be applied at several layers:
MySQL internal caches (see system parameters above).
Data‑access layer (e.g., MyBatis cache, Hibernate Persistence Object cache).
Application service layer (caching Data Transfer Object).
Web layer (page caching).
Browser/client side.
Table Partition
MySQL 5.1 introduced partitioning as a simple form of horizontal sharding; it is transparent to applications but splits a logical table into multiple physical sub‑tables.
Benefits include larger logical tables, easier maintenance, faster queries that hit few partitions, and the ability to place partitions on different storage devices. Limitations: max 1024 partitions, primary/unique keys must include the partition column, no foreign keys, NULL values break partition pruning, and all partitions must use the same engine.
Partition Types
RANGE – based on continuous intervals.
LIST – based on discrete value sets.
HASH – based on a user‑defined expression.
KEY – similar to HASH but uses MySQL’s internal hash function on integer columns.
Suitable Scenarios
Time‑series data is a common use case; partition by year, month, etc.
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
) PARTITION BY RANGE ( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);Vertical Splitting
Separate tables by logical groups (e.g., user data vs. order data). Advantages: smaller rows per block, better cache utilization, simpler maintenance. Disadvantages: redundant primary keys, increased JOINs, still limited by single‑table size, and more complex transaction handling.
Horizontal Splitting
Distribute rows across multiple tables or databases (sharding). Advantages: eliminates single‑node bottlenecks, minimal application changes, higher stability and load capacity. Disadvantages: cross‑node transaction consistency, slower cross‑node JOINs, and higher operational complexity.
Sharding Principles
Split only when necessary; follow single‑table optimization first.
Keep the number of shards low and distribute them evenly.
Choose shard rules based on growth and access patterns; range, enum, and consistent‑hash are common.
Avoid cross‑shard transactions.
Write queries should be selective; avoid SELECT * on large result sets.
Use data redundancy and partitioning to reduce cross‑shard JOINs.
Solutions
Two architectural models: client‑side and proxy‑side.
Client Architecture
Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources directly in the application. Example diagram (image omitted).
Pros: direct DB connection, lower external dependency risk, low integration cost.
Cons: limited to the data‑access layer, added load on application servers.
Proxy Architecture
Deploy an independent middleware that aggregates data sources; the application connects to the proxy as if it were a single DB. Example diagram (image omitted).
Pros: handles complex requirements, transparent to applications, strong extensibility.
Cons: requires separate deployment and maintenance, adds an extra network hop.
Comparison Table
Product
Vendor
Model
Supported DB
Sharding
Read/Write Separation
Open‑Source
Language
MySQL Fabric
MySQL
Proxy
MySQL
Yes
Yes
Yes
Python
Cobar
Alibaba
Proxy
MySQL
Yes
No
Yes
Java
ShardingJDBC
Dangdang
Client
MySQL
Yes
Yes
Yes
Java
Choosing a Solution
Decide between proxy and client architecture (client for small/medium, proxy for large/complex).
Check required features (e.g., ORDER BY support).
Avoid products with no updates in the past year.
Prefer solutions from large companies → community → small firms → individuals.
Consider popularity (GitHub stars, user feedback).
Prefer open‑source for custom modifications.
Client architecture recommendation: ShardingJDBC .
Proxy architecture recommendation: MyCat or Atlas .
MySQL‑Compatible Horizontally Scalable Databases
TiDB
Cubrid
Cloud services: Alibaba PetaData, Alibaba OceanBase, Tencent DCDB.
NoSQL
For workloads that do not require strict ACID guarantees, consider moving large, weakly‑structured tables to NoSQL solutions (e.g., logs, monitoring data, unstructured data).
PS: If you find this sharing useful, feel free to like and watch.
END
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
