Databases 27 min read

MySQL Single‑Table Optimization, Indexing, Partitioning, and Sharding Strategies

This article presents comprehensive techniques for improving MySQL performance on large single tables, covering column choices, index design, query writing, engine selection, system parameters, hardware upgrades, read‑write splitting, caching, table partitioning, vertical and horizontal sharding, and guidance on selecting mature sharding solutions.

Java Captain

Jun 14, 2020

Single‑Table Optimization

When a MySQL table grows very large, CRUD performance degrades sharply. Before considering table splitting, evaluate the data growth pattern; splitting adds logical, deployment, and operational complexity.

Typical thresholds: integer‑based tables up to 10 000 000 rows are usually fine, while string‑heavy tables up to 5 000 000 rows can still perform well. Even larger tables can often be tuned to handle tens of millions of rows.

Column Design

Prefer TINYINT, SMALLINT, MEDIUMINT over INT; add UNSIGNED when values are non‑negative.

Allocate only the necessary length for VARCHAR columns.

Replace string columns with enums or integer codes where possible.

Use TIMESTAMP instead of DATETIME for date‑time values.

Keep the total number of columns below 20.

Avoid NULL columns because they hinder index optimization and consume extra index space.

Store IP addresses as integers.

Index Design

Create indexes only on columns used in WHERE or ORDER BY clauses; verify usage with EXPLAIN.

Do not place NULL checks in indexed columns.

Skip indexes on low‑cardinality columns (e.g., gender with only a few distinct values).

For character columns, use prefix indexes and avoid making them primary keys.

Avoid foreign keys and UNIQUE constraints; enforce uniqueness in application code.

When using multi‑column indexes, keep the column order consistent with query predicates and drop unnecessary single‑column indexes.

SQL Query Practices

Enable the slow‑query log to identify expensive statements.

Avoid column operations (e.g., SELECT id WHERE age + 1 = 10); move calculations to the right‑hand side.

Keep statements simple; one SQL per CPU core, split large statements, and avoid long‑running transactions.

Never use SELECT *; specify needed columns.

Replace OR with IN for better index utilization.

Avoid functions, triggers, and pattern matches like %xxx in WHERE clauses.

Minimize JOIN usage; when required, ensure proper indexing.

Compare values of the same type (e.g., string vs string, number vs number).

Prefer BETWEEN over IN for continuous numeric ranges.

Use LIMIT for pagination and keep page sizes reasonable.

Storage Engines

MySQL mainly uses MyISAM and InnoDB:

MyISAM : table‑level locks, no transactions, no foreign keys, no crash recovery, supports BLOB/TEXT prefix indexes, and is suited for read‑intensive workloads.

InnoDB : row‑level locks with MVCC, supports transactions, foreign keys, crash recovery, but does not support full‑text indexes; ideal for write‑intensive workloads.

System Tuning Parameters

Benchmark tools: sysbench, iibench‑mysql, tpcc‑mysql. Important variables include: back_log: increase from default 50 to 500 to allow more pending connections. wait_timeout: reduce idle connection timeout from 8 hours to 30 minutes. max_user_connection: set a reasonable upper bound. thread_concurrency: set to twice the CPU core count. skip_name_resolve: disable DNS lookups for client connections. key_buffer_size (MyISAM): 256–384 MiB on a 4 GiB server; keep key_reads/key_read_requests < 0.1 %. innodb_buffer_pool_size: largest impact on InnoDB; monitor Innodb_buffer_pool_read_requests vs Innodb_buffer_pool_reads. innodb_additional_mem_pool_size, innodb_log_buffer_size (≤ 32 MiB), query_cache_size (usually ≤ 256 MiB), read_buffer_size, sort_buffer_size, read_rnd_buffer_size, record_buffer, thread_cache_size, table_cache.

Hardware Scaling

Scale‑up by increasing CPU, memory, and using SSDs, which benefits both CPU‑bound and I/O‑bound workloads.

Read‑Write Splitting

Commonly used to direct reads to replicas and writes to the primary; avoid dual‑master setups due to added complexity.

Caching Layers

Caching can be applied at multiple layers:

MySQL internal (see system parameters above).

Data‑access layer (e.g., MyBatis cache, Hibernate persistence‑object cache).

Application‑service layer (cache Data Transfer Objects).

Web layer (page caching).

Browser client.

Two main write‑through strategies:

Write‑Through : update cache and DB simultaneously; simple but moderate performance.

Write‑Back : update cache first, flush to DB asynchronously; higher performance but more complex and risk of inconsistency.

Table Partitioning

Introduced in MySQL 5.1 as a transparent horizontal split. A partitioned table appears as a single logical table but consists of multiple physical sub‑tables; indexes are per‑partition, no global index.

Benefits

Allows a single table to store more data.

Facilitates maintenance (e.g., drop or truncate whole partitions).

Queries that target few partitions run faster.

Partitions can be placed on different physical devices.

Helps avoid certain bottlenecks (e.g., InnoDB index mutex, inode lock contention).

Enables backup/restore of individual partitions.

Limitations

Maximum 1024 partitions per table.

Primary key or unique key columns must be included in the partitioning expression.

Partitions cannot have foreign‑key constraints.

NULL values break partition pruning.

All partitions must use the same storage engine.

Partition Types

RANGE : based on continuous numeric intervals.

LIST : based on discrete value sets.

HASH : uses a user‑defined expression to compute a hash.

KEY : similar to HASH but only integer columns and MySQL’s built‑in hash function.

Example – EXPLAIN PARTITIONS

mysql> explain partitions select count(1) from user_partition where id in (1,2,3,4,5);
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table          | partitions | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1  | SIMPLE      | user_partition | p1,p4      | range | PRIMARY       | PRIMARY | 8       | NULL | 5    | Using where; Using index |
+----+-------------+----------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

Use Cases

Time‑series data – partition by year/month.

Hot‑spot data – isolate frequently accessed rows in a dedicated partition.

Vertical Splitting

Separate tables by column groups (e.g., frequently accessed vs rarely accessed fields) while keeping a common primary key. Benefits: smaller row size, better cache utilization, simpler maintenance. Drawbacks: redundant primary key, increased JOINs, still limited by single‑table size, more complex transaction handling.

Horizontal Sharding

Overview

Distribute rows across multiple databases or tables based on a sharding key. Internal table partitioning is a special case of intra‑database sharding.

Advantages

No single‑node data‑size or concurrency bottleneck.

Minimal changes to application code.

Improved system stability and load capacity.

Disadvantages

Distributed transaction consistency is hard.

Cross‑node JOIN performance degrades.

Operational complexity grows with data expansion.

Sharding Principles

Shard only when necessary; start with single‑table optimization.

Keep shard count low and distribute evenly across nodes.

Select sharding rules (range, enum, consistent hash) that suit growth and query patterns.

Avoid cross‑shard transactions.

Optimize queries to target specific shards; avoid SELECT * and large result sets.

Use data redundancy and partitioning to reduce cross‑shard JOINs.

Solution Landscape

Numerous open‑source and commercial sharding frameworks exist. The table below summarizes key attributes (client vs proxy architecture, supported databases, open‑source status, language, recent activity, GitHub stars, etc.).

Name

Provider

Architecture

DB Support

Sharding

Table Splitting

Read‑Write Split

External Dependency

Open‑Source

Language

Last Update

Stars

MySQL Fabric

MySQL

Proxy

MySQL

Yes

None

Yes

Python

4 months ago

Cobar

Alibaba

Proxy

MySQL

Yes

None

Yes

Java

2 years ago

1287

ShardingJDBC

Dangdang

Client

MySQL

Yes

None

Yes

Java

Today

1144

MyCat

Community

Proxy

MySQL

Yes

None

Yes

Java

2 days ago

1270

Vitess

YouTube

Proxy

MySQL

Yes

None

Yes

Today

3636

Choosing a Solution

Decide between client‑side or proxy architecture (client for small/medium, proxy for large/complex).

Verify required features (e.g., ORDER BY support across shards).

Prefer actively maintained projects (avoid those with no updates in the past year).

Prioritize vendors with strong reputation (large companies → community → small firms → individuals).

Consider community feedback and GitHub star count.

Open‑source is preferred for custom extensions.

Recommended choices:

Client architecture: ShardingJDBC .

Proxy architecture: MyCat or Atlas .

MySQL‑Compatible Horizontally Scalable Databases

TiDB

Cubrid

These open‑source solutions still lag behind MySQL in production robustness and require significant operational effort. Cloud offerings such as Alibaba Cloud PetaData, Alibaba Cloud OceanBase, and Tencent Cloud DCDB provide managed alternatives.

NoSQL Alternatives

When ACID guarantees are not required, moving large, write‑heavy, or weakly structured datasets to NoSQL stores can eliminate horizontal scaling challenges. Typical candidates include log, monitoring, analytics data, unstructured or semi‑structured data, and workloads with low transactional requirements.

-END-

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance tuning mysql Database Optimization Partitioning

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.