Databases 24 min read

Optimizing Large MySQL Tables: Indexes, Partitioning, Sharding, and System Tuning

When a MySQL table grows to millions of rows, query, insert, update and delete performance degrades sharply, but by applying proper field design, index strategies, SQL best practices, engine selection, system parameters, caching, partitioning, vertical and horizontal sharding, and choosing suitable client‑side or proxy‑side architectures, the database can efficiently handle tens of millions of records.

Java Captain

Mar 31, 2020

Optimizing Large MySQL Tables: Indexes, Partitioning, Sharding, and System Tuning

Single‑Table Optimization

Do not split a table unless its size will keep growing; splitting adds logical, deployment and operational complexity. Generally, tables with integer primary keys under ten million rows and string‑heavy tables under five million rows perform well.

Nevertheless, many MySQL single tables can still be tuned to support tens of millions of rows.

Field Design

Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED if non‑negative.

Allocate only the necessary length for VARCHAR columns.

Use enums or integers instead of strings.

Prefer TIMESTAMP to DATETIME.

Keep the number of columns under 20.

Avoid NULL columns because they hinder index optimization and waste space.

Store IP addresses as integers.

Index Strategy

Create indexes only where they help; focus on columns used in WHERE and ORDER BY.

Use EXPLAIN to verify index usage.

Avoid WHERE col IS NULL checks; they force full scans.

Do not index low‑cardinality columns (e.g., gender).

For character columns, create prefix indexes only.

Avoid using character columns as primary keys.

Prefer application‑level constraints over foreign keys and UNIQUE constraints.

When using composite indexes, keep column order consistent with query predicates and drop unnecessary single‑column indexes.

SQL Query Best Practices

Enable the slow‑query log to locate expensive statements.

Never perform column calculations in the WHERE clause (e.g., age+1=10); move calculations to the right‑hand side.

Keep SQL statements simple; split large statements, reduce lock time.

Avoid SELECT *.

Replace OR with IN (log‑scale performance, keep IN list under 200 items).

Implement functions and triggers in application code, not in the database.

Avoid LIKE '%xxx' patterns.

Minimize JOIN usage.

Compare values of the same type (e.g., string‑to‑string, integer‑to‑integer).

Avoid != or <> in WHERE clauses; they disable index usage.

Use BETWEEN for continuous ranges instead of IN.

Paginate with LIMIT and keep page size reasonable.

Engine Choice

MySQL mainly uses MyISAM and InnoDB :

MyISAM

No row‑level locking; table‑level read and write locks.

No transaction support.

No foreign‑key support.

No crash‑safe recovery.

Supports concurrent inserts.

Allows indexing of the first 500 characters of BLOB/TEXT and full‑text indexes.

Supports delayed index updates for high write throughput.

Compressible for read‑only tables.

InnoDB

Row‑level locking with MVCC for high concurrency.

Full transaction support.

Foreign‑key constraints.

Crash‑safe recovery.

Full‑text indexes supported from MySQL 5.6.4 onward.

In practice, MyISAM suits read‑heavy workloads, while InnoDB excels at INSERT/UPDATE‑heavy workloads.

System Tuning Parameters

Benchmark tools: sysbench , iibench‑mysql , tpcc‑mysql .

backlog : number of pending connections before MySQL refuses new ones; raise from 50 to 500.

wait_timeout : idle connection timeout; reduce from 8 hours to 30 minutes.

max_user_connections : limit per‑user connections; set a reasonable ceiling.

thread_concurrency : set to twice the CPU core count.

skip_name_resolve : disable DNS lookup for client hosts.

key_buffer_size : MyISAM index cache; increase to 256 MiB‑384 MiB on a 4 GiB server; monitor show status like 'Key_read%' to keep key_reads/key_read_requests < 0.1%.

innodb_buffer_pool_size : InnoDB data and index cache; monitor show status like 'Innodb_buffer_pool_read%' and keep read‑request ratio high.

innodb_additional_mem_pool_size : memory for InnoDB internal structures; increase if many objects cause warnings.

innodb_log_buffer_size : transaction log buffer; usually keep ≤ 32 MiB.

query_cache_size : cache of result sets; adjust based on hit rate (Qcache_hits/(Qcache_hits+Qcache_inserts))*100, typical size ~256 MiB.

read_buffer_size , sort_buffer_size , read_rnd_buffer_size , record_buffer , thread_cache_size , table_cache : various per‑connection buffers; tune according to workload.

Hardware Scaling

Scale‑up by adding CPU, memory, or SSDs depending on whether the workload is CPU‑ or I/O‑bound.

Read‑Write Splitting

Deploy a replica for reads and a primary for writes; avoid multi‑master setups to reduce complexity.

Caching Layers

MySQL internal caches (tuned via the parameters above).

Data‑access layer caches (e.g., MyBatis, Hibernate).

Application‑service caches (caching DTOs).

Web‑layer page caches.

Browser‑side caches.

Table Partitioning

MySQL 5.1 introduced horizontal partitioning; the table appears as a single logical entity while physically stored in multiple sub‑tables.

Partitions have their own indexes; there is no global index.

Benefits

Store more data in a single logical table.

Easy bulk deletion or addition of whole partitions.

Queries that target a few partitions run faster.

Partitions can be placed on different physical devices.

Mitigate certain bottlenecks (e.g., InnoDB index mutex, ext3 inode lock).

Backup/restore at partition level.

Limitations

Maximum 1024 partitions per table.

Primary key or unique key columns must be part of the partition key.

No foreign‑key support on partitioned tables.

NULL values break partition pruning.

All partitions must use the same storage engine.

Partition Types

RANGE : based on continuous intervals.

LIST : based on discrete value sets.

HASH : based on a user‑defined expression.

KEY : similar to HASH but uses MySQL’s internal hash function on integer columns.

Typical Use Cases

Time‑series data (e.g., logs, orders) are often partitioned by date, allowing fast queries on recent data and easy archival of old partitions.

Vertical Sharding

Separate unrelated groups of columns into different tables or databases (e.g., user info vs. order info).

Advantages: smaller rows, better cache utilization, simpler maintenance.

Disadvantages: redundant primary keys, increased JOINs, still limited by single‑table size, more complex transaction handling.

Horizontal Sharding

Overview

Distribute rows across multiple tables or databases based on a sharding key, achieving true distribution and supporting massive data volumes.

Advantages

Eliminates single‑node bottlenecks.

Minimal changes to application code.

Improves stability and load capacity.

Disadvantages

Distributed transaction consistency is hard.

Cross‑node JOIN performance suffers.

Maintenance and scaling become more complex.

Sharding Principles

Shard only when necessary; start with single‑table optimization.

Keep shard count low and distribute evenly across nodes.

Choose sharding rules (range, enum, consistent hash) that support future expansion.

Avoid cross‑shard transactions.

Write efficient queries; avoid SELECT * on large result sets.

Reduce cross‑shard JOINs via data duplication or partitioning.

Solution Architectures

Client‑Side Sharding

Modify the data‑access layer (JDBC, DataSource, MyBatis) to manage multiple data sources directly in the application.

Pros: lower operational risk, no extra middleware.

Cons: limited extensibility, sharding logic runs on application servers.

Proxy‑Side Sharding

Deploy an independent middleware that abstracts multiple data sources; the application connects to the proxy as if it were a single database.

Pros: handles complex scenarios, transparent to applications, strong extensibility.

Cons: additional deployment/maintenance cost, extra network hop.

Tool Recommendations

Client‑side: ShardingJDBC

Proxy‑side: MyCat or Atlas

MySQL‑Compatible Horizontally Scalable Databases

Open‑source alternatives: TiDB , CUBRID . Cloud offerings: Alibaba Cloud PolarDB, OceanBase, Tencent Cloud DCDB.

NoSQL Alternatives

For data that does not require strict ACID guarantees—such as logs, monitoring metrics, unstructured data—consider moving to NoSQL stores to achieve effortless horizontal scaling.

END

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization Sharding MySQL partitioning Database Tuning

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.