MySQL Single‑Table Optimization, Sharding, and Scaling Strategies
This article explains why MySQL tables with massive row counts suffer performance degradation and provides practical guidance on single‑table optimization, field and index design, query tuning, engine selection, system parameters, hardware upgrades, read‑write splitting, caching layers, partitioning, vertical and horizontal sharding, as well as client‑side and proxy‑side sharding solutions.
When a MySQL single table contains a huge number of rows, CRUD performance drops sharply.
Single‑Table Optimization
Do not split a table unless its data will continuously grow; splitting adds logical, deployment, and operational complexity. Generally, tables with integer primary keys below ten million rows and string‑heavy tables below five million rows are fine.
Fields
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; use UNSIGNED for non‑negative values.
Allocate only the necessary length for VARCHAR.
Replace string columns with enums or integers.
Prefer TIMESTAMP to DATETIME.
Keep the number of columns under 20.
Avoid NULL columns because they hinder index optimization and waste space.
Store IP addresses as integers.
Indexes
Create indexes only where they help queries, especially for WHERE and ORDER BY clauses.
Use EXPLAIN to verify that an index is used instead of a full table scan.
Avoid WHERE column IS NULL checks; they disable index usage.
Do not index low‑cardinality columns such as gender.
For character columns, create prefix indexes only.
Avoid using character columns as primary keys.
Do not rely on foreign keys; enforce constraints in application code.
Avoid UNIQUE constraints; enforce uniqueness in code.
When using multi‑column indexes, keep column order consistent with query conditions and drop unnecessary single‑column indexes.
Query SQL Tips
Enable slow‑query log to locate expensive statements.
Never perform column calculations in the WHERE clause (e.g., SELECT id WHERE age+1=10); move calculations to the right side.
Keep SQL statements simple; a single statement should run on one CPU core.
Avoid SELECT *.
Replace OR with IN (IN is logarithmic, OR is linear; keep IN list under 200 items).
Implement functions and triggers in the application layer instead of the database.
Avoid LIKE '%xxx' patterns.
Minimize use of JOIN.
Compare values of the same type (e.g., string‑to‑string, number‑to‑number).
Avoid != or <> in WHERE clauses; they prevent index usage.
Use BETWEEN for continuous numeric ranges instead of IN.
Paginate large result sets with LIMIT and keep page size reasonable.
Engine
MySQL mainly uses two storage engines:
MyISAM
No row‑level locking; table‑level read and exclusive write locks.
No transaction support.
No foreign‑key support.
No crash‑safe recovery.
Allows concurrent reads and inserts.
Supports indexing of the first 500 characters of BLOB/TEXT and full‑text indexes.
Delayed index updates improve write performance.
Supports compressed tables for read‑only data.
InnoDB
Row‑level locking with MVCC for high concurrency.
Supports transactions.
Supports foreign keys.
Crash‑safe recovery.
Full‑text indexes are supported from MySQL 5.6.4 onward.
In general, MyISAM suits read‑intensive workloads, while InnoDB fits insert‑/update‑heavy workloads.
System Tuning Parameters
Benchmark tools:
sysbench : modular, cross‑platform, multithreaded performance testing.
iibench‑mysql : Java‑based MySQL/Percona/MariaDB index insertion benchmark.
tpcc‑mysql : Percona’s TPC‑C benchmark.
Important MySQL variables (examples):
backlog : number of pending connections before MySQL rejects new ones; can be raised from 50 to 500.
wait_timeout : idle connection timeout; can be reduced from 8 hours to 30 minutes.
max_user_connections : maximum connections per user; default 0 (unlimited) – set a reasonable limit.
thread_concurrency : set to twice the CPU core count.
skip_name_resolve : disable DNS lookup for external connections.
key_buffer_size : MyISAM index cache; increase to improve index speed (e.g., 256 MiB‑384 MiB on a 4 GiB server). Verify with show status like 'key_read%' that key_reads / key_read_requests stays below 0.1 %.
innodb_buffer_pool_size : InnoDB data and index cache; monitor with show status like 'Innodb_buffer_pool_read%' and keep
(Innodb_buffer_pool_read_requests - Innodb_buffer_pool_reads) / Innodb_buffer_pool_read_requestshigh.
innodb_additional_mem_pool_size : memory for InnoDB data dictionary; increase when many objects cause warnings.
innodb_log_buffer_size : transaction log buffer; usually not larger than 32 MiB.
query_cache_size : caches result sets of SELECT statements; adjust based on hit rate (Qcache_hits / (Qcache_hits + Qcache_inserts) * 100). Large caches may be counter‑productive for frequent writes.
read_buffer_size , sort_buffer_size , read_rnd_buffer_size , record_buffer , thread_cache_size , table_cache : various buffers that can be tuned according to workload.
Hardware Upgrade
Scale‑up by increasing CPU, memory, or switching to SSDs, depending on whether MySQL is CPU‑ or I/O‑bound.
Read‑Write Splitting
Use a replica for reads and the primary for writes; avoid dual‑master setups to reduce complexity.
Caching Layers
MySQL internal : parameters described above.
Data access layer : e.g., MyBatis caches SQL results; Hibernate can cache individual entities.
Application service layer : cache Data Transfer Objects (DTOs) programmatically.
Web layer : page‑level caching.
Browser client : client‑side cache.
Cache can be applied at one or multiple layers depending on the scenario.
Table Partitioning
MySQL 5.1 introduced partitioning, a simple horizontal split that is transparent to applications.
Partitions appear as a single logical table but consist of multiple physical sub‑tables; indexes are per‑partition, no global index.
Benefits
Allows a single table to store more data.
Facilitates maintenance (bulk delete, add new partitions, optimize/check/repair a single partition).
Queries that target a few partitions run much faster.
Partitions can be placed on different physical devices.
Helps avoid bottlenecks such as InnoDB index mutexes or ext3 inode lock contention.
Individual partitions can be backed up and restored.
Drawbacks
Maximum of 1024 partitions per table.
If a partition column is part of a primary key or unique index, all primary‑key and unique columns must be included.
Partitions cannot use foreign keys.
NULL values invalidate partition pruning.
All partitions must use the same storage engine.
Partition Types
RANGE : based on a continuous interval of column values.
LIST : based on discrete column values.
HASH : uses a user‑defined expression that returns a non‑negative integer.
KEY : similar to HASH but only supports integer columns and uses MySQL’s built‑in hash function.
Suitable Scenarios
Time‑series data is ideal for RANGE partitioning; queries with a time range become very efficient and old data can be purged in bulk.
Vertical Splitting
Vertical splitting separates tables based on column relevance. Example: user data in one database, order data in another.
Vertical table splitting divides a wide table into two tables with the same primary key but different column sets.
Advantages
Reduces row size, allowing more rows per data block and fewer I/O operations.
Improves cache utilization by grouping frequently changed columns separately.
Simplifies data maintenance.
Disadvantages
Primary key duplication requires extra management.
Introduces JOINs, increasing CPU load (can be mitigated by performing joins in the application).
Does not solve the problem of a single table becoming too large; horizontal sharding may still be needed.
Transaction handling becomes more complex.
Horizontal Sharding
Overview
Horizontal sharding distributes rows across multiple tables or databases based on a sharding strategy, achieving true distribution and supporting massive data volumes.
In‑database partitioning is a special case of intra‑database sharding.
Advantages
No single‑node data or concurrency bottleneck.
Minimal changes required on the application side.
Improves system stability and load capacity.
Disadvantages
Ensuring transactional consistency across shards is difficult.
Cross‑node JOIN performance is poor and logic is complex.
Data expansion and maintenance become more demanding.
Sharding Principles
Prefer not to shard if single‑table optimization suffices.
Keep the number of shards low and distribute them evenly across nodes; add shards only when necessary.
Choose sharding rules carefully (range, list, consistent hash) to support future scaling.
Avoid cross‑shard transactions whenever possible.
Optimize query conditions, avoid SELECT *, and index frequently used queries.
Reduce cross‑database JOINs through data redundancy or partitioning.
Solution Approaches
Sharding solutions fall into two categories:
Client‑side architecture
Proxy (middleware) architecture
Client‑Side Architecture
Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources and perform sharding logic within the application, typically packaged as a JAR.
Example: ShardingJDBC.
Pros
Direct database connections reduce external failure risk.
Low integration cost; no extra operational components.
Cons
Limited to the data‑access layer; scalability may be insufficient for complex systems.
Sharding logic adds load to application servers.
Proxy Architecture
Deploy an independent middleware that transparently manages all data sources and sharding, keeping the backend cluster invisible to applications. Usually runs as a clustered service with coordination tools like Zookeeper.
Examples: MyCat, Atlas.
Pros
Handles complex requirements and offers strong extensibility.
Applications remain unchanged; no additional load on them.
Cons
Requires deployment and maintenance of extra middleware, increasing cost.
Introduces an extra network hop, causing some performance loss and additional risk.
Comparison of Solutions
Choose client‑side (ShardingJDBC) for small‑to‑medium projects; choose proxy (MyCat or Atlas) for large, complex environments.
MySQL‑Compatible Horizontally Scalable Databases
Open‑source options:
https://github.com/pingcap/tidb http://www.cubrid.org/
Cloud offerings:
https://cn.aliyun.com/product/petadata/ https://cn.aliyun.com/product/oceanbase https://www.qcloud.com/product/dcdbfortdsql.html
NoSQL Alternatives
For workloads that do not require strict ACID guarantees, consider moving large tables to NoSQL stores (log, monitoring, analytics, unstructured data, or low‑transactional data).
Author: Low‑to‑One‑Discount juejin.im/post/5b7d52afe51d453885030b91
Instead of endlessly searching for interview questions online, follow us now!
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
