Database Sharding (分库分表): Concepts, Strategies, and Implementation with ShardingSphere
This article explains database sharding concepts, reasons for using it, various sharding strategies, and demonstrates implementation with ShardingSphere, including configuration and Java code examples, while also offering additional resources for advanced architecture and interview preparation.
Database sharding (分库分表) is presented as a crucial technique for large‑scale architectures to improve performance and scalability by distributing data across multiple databases or tables.
The article outlines why sharding is needed when data volume and concurrency grow, especially when single tables reach tens of millions or billions of rows, and how it enhances query speed, write throughput, and fault tolerance.
Implementation steps are described: analyzing data and bottlenecks, choosing a sharding type (hash, range, list, or composite), and using middleware such as ShardingSphere or MyCat to realize the split.
Various sharding strategies are detailed:
Hash Sharding: distributes data evenly based on a hash of a chosen field (e.g., user ID).
Range Sharding: partitions data by value ranges such as dates or numeric intervals.
List Sharding: partitions by specific field values like region or category.
Composite Sharding: combines multiple strategies, e.g., region then hash.
An example of vertical sharding for an e‑commerce system is given, showing how user and order tables can reside in separate databases. The corresponding ShardingSphere YAML configuration is provided:
schemaName: ecommerce
dataSources:
user_db:
url: jdbc:mysql://localhost:3306/user_db
username: root
password: password
order_db:
url: jdbc:mysql://localhost:3306/order_db
username: root
password: password
shardingRule:
tables:
users:
actualDataNodes: user_db.users
orders:
actualDataNodes: order_db.ordersJava code for loading the configuration and querying a user record is also shown:
// Load vertical sharding configuration file
DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(new File("vertical-sharding.yml"));
// Query user information
String userSql = "SELECT * FROM users WHERE user_id = ?";
try (Connection conn = dataSource.getConnection();
PreparedStatement pstmt = conn.prepareStatement(userSql)) {
pstmt.setLong(1, 1L);
try (ResultSet rs = pstmt.executeQuery()) {
while (rs.next()) {
System.out.println("User ID: " + rs.getLong("user_id") + ", Username: " + rs.getString("username"));
}
}
}Key practical considerations include ensuring even data distribution, handling cross‑shard transactions (e.g., with Seata), and generating globally unique IDs (e.g., using Snowflake).
The article concludes that a well‑designed sharding strategy combined with appropriate middleware can significantly boost system performance and scalability.
Finally, the author promotes additional resources such as a 300k‑word architecture collection and a comprehensive Java interview Q&A set, inviting readers to obtain them via a WeChat public account.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.