Databases 11 min read

Master MySQL Partitioning and Sharding: Theory, Practice, and Performance

This article explains the programming mindset of splitting and merging, details MySQL partition implementation and internal file types, demonstrates how partitioning improves query performance, and explores sharding strategies, middleware choices, and common issues like distributed transactions and cross‑shard joins.

Java Backend Technology

Apr 22, 2020

Master MySQL Partitioning and Sharding: Theory, Practice, and Performance

1. Split and Merge

Programming mindset matters more than any single technology; the key is to understand the concepts of splitting (fine‑grained granularity) and merging (coarse‑grained aggregation).

1.1 Split

Centralized services evolve to distributed services.

From Collections.synchronizedMap to ConcurrentHashMap (JDK 1.7/1.8) – finer lock granularity while keeping thread safety.

From AtomicInteger to LongAdder – reduce CAS operations for high‑concurrency accumulation.

JVM G1 GC divides the heap into many regions for memory management.

HBase RegionServer splits data into multiple regions.

Thread‑pool resource isolation in typical development.

1.2 Merge

TLAB (Thread‑Local Allocation Buffers) – avoid contention and speed up object allocation.

Escape analysis – allocate objects on the stack when possible, reducing heap allocations.

CMS GC can compact memory after Full GC (e.g., -XX:UseCMS-CompactAtFullCollection).

Lock coarsening – JIT expands lock scope when consecutive operations lock the same object.

Kafka batch settings ( batch.size, linger.ms) reduce network overhead.

Batch‑fetch APIs in typical development.

2. Partition

All examples are based on MySQL InnoDB.

2.1 Implementation

If a table has a primary key or unique index, the partition column must be part of that unique index. Partitioning is transparent to the application; no code changes are required.

2.2 Internal Files

Data directory (example):

File types:

.frm : table definition file.

.ibd : InnoDB stores both data and indexes in this file (or .MYD/.MYI for MyISAM).

.par : partition definition file (removed after MySQL 5.7.6; definitions are stored in the internal data dictionary).

2.3 Data Processing

Partitioned tables improve MySQL performance by splitting a large B+‑tree into several smaller trees, reducing I/O and cache pressure. When the query uses the partition key, it first accesses the partition’s secondary index B+‑tree, then the primary index B+‑tree. Without the partition key, the query scans all partitions, causing multiple logical I/Os.

To view partition usage, run EXPLAIN PARTITIONS SELECT … and observe the number of partitions accessed.

mysql> explain partitions select * from TxnList where startTime>'2016-08-25 00:00:00' and startTime<'2016-08-25 23:59:00';
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table             | partitions | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | ClientActionTrack | p20160825 | ALL  | NULL          | NULL | NULL    | NULL | 33868 | Using where |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+

3. Sharding (Database and Table Splitting)

When a table grows with time and business, its size and operation volume increase, eventually exceeding the capacity of a single machine.

Sharding (splitting databases and tables) is used to handle ultra‑large tables that cannot fit on one server.

3.1 Standards

Storage > 100 GB.

Daily data growth > 2 million rows.

Single table row count > 100 million.

3.2 Sharding Fields

The sharding field should be a frequently queried numeric column, such as userId, to ensure even data distribution and efficient lookups.

3.3 Distributed Database Middleware

Middleware can be proxy‑based (e.g., MyCat, DBProxy) or client‑side (e.g., TDDL, Sharding‑JDBC). Proxy mode routes all SQL through a high‑availability proxy; client‑side mode embeds routing logic in the application’s connection pool.

Advantages and disadvantages are illustrated in the following diagram:

3.4 Issues

3.4.1 Distributed Transactions

Sharding introduces distributed transactions. Traditional XA is heavy; modern solutions use flexible transaction models such as TCC, Saga, or FMT.

3.4.2 Cross‑Shard Joins

Middleware like TDDL and MyCAT support cross‑shard joins, but they should be avoided when possible by denormalizing data or using redundant fields.

4. Summary

Partitioning and sharding address different problems: partitioning is an internal MySQL feature, usually time‑range based for archiving, while sharding (splitting tables/databases) handles ultra‑large tables that cannot fit on a single node. Both improve performance, but partitioning incurs less data movement because it is managed by MySQL itself.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization mysql Database design Distributed Transactions

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.