Master MySQL Partitioning and Sharding: Theory, Practice, and Performance
This article explains the programming mindset of splitting and merging, details MySQL partition implementation and internal file types, demonstrates how partitioning improves query performance, and explores sharding strategies, middleware choices, and common issues like distributed transactions and cross‑shard joins.
1. Split and Merge
Programming mindset matters more than any single technology; the key is to understand the concepts of splitting (fine‑grained granularity) and merging (coarse‑grained aggregation).
1.1 Split
Centralized services evolve to distributed services.
From Collections.synchronizedMap to ConcurrentHashMap (JDK 1.7/1.8) – finer lock granularity while keeping thread safety.
From AtomicInteger to LongAdder – reduce CAS operations for high‑concurrency accumulation.
JVM G1 GC divides the heap into many regions for memory management.
HBase RegionServer splits data into multiple regions.
Thread‑pool resource isolation in typical development.
1.2 Merge
TLAB (Thread‑Local Allocation Buffers) – avoid contention and speed up object allocation.
Escape analysis – allocate objects on the stack when possible, reducing heap allocations.
CMS GC can compact memory after Full GC (e.g., -XX:UseCMS-CompactAtFullCollection).
Lock coarsening – JIT expands lock scope when consecutive operations lock the same object.
Kafka batch settings ( batch.size, linger.ms) reduce network overhead.
Batch‑fetch APIs in typical development.
2. Partition
All examples are based on MySQL InnoDB.
2.1 Implementation
If a table has a primary key or unique index, the partition column must be part of that unique index. Partitioning is transparent to the application; no code changes are required.
2.2 Internal Files
Data directory (example):
File types:
.frm : table definition file.
.ibd : InnoDB stores both data and indexes in this file (or .MYD/.MYI for MyISAM).
.par : partition definition file (removed after MySQL 5.7.6; definitions are stored in the internal data dictionary).
2.3 Data Processing
Partitioned tables improve MySQL performance by splitting a large B+‑tree into several smaller trees, reducing I/O and cache pressure. When the query uses the partition key, it first accesses the partition’s secondary index B+‑tree, then the primary index B+‑tree. Without the partition key, the query scans all partitions, causing multiple logical I/Os.
To view partition usage, run EXPLAIN PARTITIONS SELECT … and observe the number of partitions accessed.
mysql> explain partitions select * from TxnList where startTime>'2016-08-25 00:00:00' and startTime<'2016-08-25 23:59:00';
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | ClientActionTrack | p20160825 | ALL | NULL | NULL | NULL | NULL | 33868 | Using where |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+3. Sharding (Database and Table Splitting)
When a table grows with time and business, its size and operation volume increase, eventually exceeding the capacity of a single machine.
Sharding (splitting databases and tables) is used to handle ultra‑large tables that cannot fit on one server.
3.1 Standards
Storage > 100 GB.
Daily data growth > 2 million rows.
Single table row count > 100 million.
3.2 Sharding Fields
The sharding field should be a frequently queried numeric column, such as userId, to ensure even data distribution and efficient lookups.
3.3 Distributed Database Middleware
Middleware can be proxy‑based (e.g., MyCat, DBProxy) or client‑side (e.g., TDDL, Sharding‑JDBC). Proxy mode routes all SQL through a high‑availability proxy; client‑side mode embeds routing logic in the application’s connection pool.
Advantages and disadvantages are illustrated in the following diagram:
3.4 Issues
3.4.1 Distributed Transactions
Sharding introduces distributed transactions. Traditional XA is heavy; modern solutions use flexible transaction models such as TCC, Saga, or FMT.
3.4.2 Cross‑Shard Joins
Middleware like TDDL and MyCAT support cross‑shard joins, but they should be avoided when possible by denormalizing data or using redundant fields.
4. Summary
Partitioning and sharding address different problems: partitioning is an internal MySQL feature, usually time‑range based for archiving, while sharding (splitting tables/databases) handles ultra‑large tables that cannot fit on a single node. Both improve performance, but partitioning incurs less data movement because it is managed by MySQL itself.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
