Optimizing Large MySQL Tables: Indexes, Partitioning, Sharding, and System Tuning
When a MySQL table grows to millions of rows, query, insert, update and delete performance degrades sharply, but by applying proper field design, index strategies, SQL best practices, engine selection, system parameters, caching, partitioning, vertical and horizontal sharding, and choosing suitable client‑side or proxy‑side architectures, the database can efficiently handle tens of millions of records.
Single‑Table Optimization
Do not split a table unless its size will keep growing; splitting adds logical, deployment and operational complexity. Generally, tables with integer primary keys under ten million rows and string‑heavy tables under five million rows perform well.
Nevertheless, many MySQL single tables can still be tuned to support tens of millions of rows.
Field Design
Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED if non‑negative.
Allocate only the necessary length for VARCHAR columns.
Use enums or integers instead of strings.
Prefer TIMESTAMP to DATETIME.
Keep the number of columns under 20.
Avoid NULL columns because they hinder index optimization and waste space.
Store IP addresses as integers.
Index Strategy
Create indexes only where they help; focus on columns used in WHERE and ORDER BY.
Use EXPLAIN to verify index usage.
Avoid WHERE col IS NULL checks; they force full scans.
Do not index low‑cardinality columns (e.g., gender).
For character columns, create prefix indexes only.
Avoid using character columns as primary keys.
Prefer application‑level constraints over foreign keys and UNIQUE constraints.
When using composite indexes, keep column order consistent with query predicates and drop unnecessary single‑column indexes.
SQL Query Best Practices
Enable the slow‑query log to locate expensive statements.
Never perform column calculations in the WHERE clause (e.g., age+1=10); move calculations to the right‑hand side.
Keep SQL statements simple; split large statements, reduce lock time.
Avoid SELECT *.
Replace OR with IN (log‑scale performance, keep IN list under 200 items).
Implement functions and triggers in application code, not in the database.
Avoid LIKE '%xxx' patterns.
Minimize JOIN usage.
Compare values of the same type (e.g., string‑to‑string, integer‑to‑integer).
Avoid != or <> in WHERE clauses; they disable index usage.
Use BETWEEN for continuous ranges instead of IN.
Paginate with LIMIT and keep page size reasonable.
Engine Choice
MySQL mainly uses MyISAM and InnoDB :
MyISAM
No row‑level locking; table‑level read and write locks.
No transaction support.
No foreign‑key support.
No crash‑safe recovery.
Supports concurrent inserts.
Allows indexing of the first 500 characters of BLOB/TEXT and full‑text indexes.
Supports delayed index updates for high write throughput.
Compressible for read‑only tables.
InnoDB
Row‑level locking with MVCC for high concurrency.
Full transaction support.
Foreign‑key constraints.
Crash‑safe recovery.
Full‑text indexes supported from MySQL 5.6.4 onward.
In practice, MyISAM suits read‑heavy workloads, while InnoDB excels at INSERT/UPDATE‑heavy workloads.
System Tuning Parameters
Benchmark tools: sysbench , iibench‑mysql , tpcc‑mysql .
backlog : number of pending connections before MySQL refuses new ones; raise from 50 to 500.
wait_timeout : idle connection timeout; reduce from 8 hours to 30 minutes.
max_user_connections : limit per‑user connections; set a reasonable ceiling.
thread_concurrency : set to twice the CPU core count.
skip_name_resolve : disable DNS lookup for client hosts.
key_buffer_size : MyISAM index cache; increase to 256 MiB‑384 MiB on a 4 GiB server; monitor show status like 'Key_read%' to keep key_reads/key_read_requests < 0.1%.
innodb_buffer_pool_size : InnoDB data and index cache; monitor show status like 'Innodb_buffer_pool_read%' and keep read‑request ratio high.
innodb_additional_mem_pool_size : memory for InnoDB internal structures; increase if many objects cause warnings.
innodb_log_buffer_size : transaction log buffer; usually keep ≤ 32 MiB.
query_cache_size : cache of result sets; adjust based on hit rate (Qcache_hits/(Qcache_hits+Qcache_inserts))*100, typical size ~256 MiB.
read_buffer_size , sort_buffer_size , read_rnd_buffer_size , record_buffer , thread_cache_size , table_cache : various per‑connection buffers; tune according to workload.
Hardware Scaling
Scale‑up by adding CPU, memory, or SSDs depending on whether the workload is CPU‑ or I/O‑bound.
Read‑Write Splitting
Deploy a replica for reads and a primary for writes; avoid multi‑master setups to reduce complexity.
Caching Layers
MySQL internal caches (tuned via the parameters above).
Data‑access layer caches (e.g., MyBatis, Hibernate).
Application‑service caches (caching DTOs).
Web‑layer page caches.
Browser‑side caches.
Table Partitioning
MySQL 5.1 introduced horizontal partitioning; the table appears as a single logical entity while physically stored in multiple sub‑tables.
Partitions have their own indexes; there is no global index.
Benefits
Store more data in a single logical table.
Easy bulk deletion or addition of whole partitions.
Queries that target a few partitions run faster.
Partitions can be placed on different physical devices.
Mitigate certain bottlenecks (e.g., InnoDB index mutex, ext3 inode lock).
Backup/restore at partition level.
Limitations
Maximum 1024 partitions per table.
Primary key or unique key columns must be part of the partition key.
No foreign‑key support on partitioned tables.
NULL values break partition pruning.
All partitions must use the same storage engine.
Partition Types
RANGE : based on continuous intervals.
LIST : based on discrete value sets.
HASH : based on a user‑defined expression.
KEY : similar to HASH but uses MySQL’s internal hash function on integer columns.
Typical Use Cases
Time‑series data (e.g., logs, orders) are often partitioned by date, allowing fast queries on recent data and easy archival of old partitions.
Vertical Sharding
Separate unrelated groups of columns into different tables or databases (e.g., user info vs. order info).
Advantages: smaller rows, better cache utilization, simpler maintenance.
Disadvantages: redundant primary keys, increased JOINs, still limited by single‑table size, more complex transaction handling.
Horizontal Sharding
Overview
Distribute rows across multiple tables or databases based on a sharding key, achieving true distribution and supporting massive data volumes.
Advantages
Eliminates single‑node bottlenecks.
Minimal changes to application code.
Improves stability and load capacity.
Disadvantages
Distributed transaction consistency is hard.
Cross‑node JOIN performance suffers.
Maintenance and scaling become more complex.
Sharding Principles
Shard only when necessary; start with single‑table optimization.
Keep shard count low and distribute evenly across nodes.
Choose sharding rules (range, enum, consistent hash) that support future expansion.
Avoid cross‑shard transactions.
Write efficient queries; avoid SELECT * on large result sets.
Reduce cross‑shard JOINs via data duplication or partitioning.
Solution Architectures
Client‑Side Sharding
Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources directly in the application.
Pros: lower operational risk, no extra middleware.
Cons: limited extensibility, sharding logic runs on application servers.
Proxy‑Side Sharding
Deploy an independent middleware that abstracts multiple data sources; the application connects to the proxy as if it were a single database.
Pros: handles complex scenarios, transparent to applications, strong extensibility.
Cons: additional deployment/maintenance cost, extra network hop.
Tool Recommendations
Client‑side: ShardingJDBC
Proxy‑side: MyCat or Atlas
MySQL‑Compatible Horizontally Scalable Databases
Open‑source alternatives: TiDB , CUBRID . Cloud offerings: Alibaba Cloud PolarDB, OceanBase, Tencent Cloud DCDB.
NoSQL Alternatives
For data that does not require strict ACID guarantees—such as logs, monitoring metrics, unstructured data—consider moving to NoSQL stores to achieve effortless horizontal scaling.
END
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
