Databases 6 min read

Database Sharding (分库分表): Concepts, Strategies, and Implementation with ShardingSphere

This article explains database sharding concepts, reasons for using it, various sharding strategies, and demonstrates implementation with ShardingSphere, including configuration and Java code examples, while also offering additional resources for advanced architecture and interview preparation.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Database Sharding (分库分表): Concepts, Strategies, and Implementation with ShardingSphere

Database sharding (分库分表) is presented as a crucial technique for large‑scale architectures to improve performance and scalability by distributing data across multiple databases or tables.

The article outlines why sharding is needed when data volume and concurrency grow, especially when single tables reach tens of millions or billions of rows, and how it enhances query speed, write throughput, and fault tolerance.

Implementation steps are described: analyzing data and bottlenecks, choosing a sharding type (hash, range, list, or composite), and using middleware such as ShardingSphere or MyCat to realize the split.

Various sharding strategies are detailed:

Hash Sharding: distributes data evenly based on a hash of a chosen field (e.g., user ID).

Range Sharding: partitions data by value ranges such as dates or numeric intervals.

List Sharding: partitions by specific field values like region or category.

Composite Sharding: combines multiple strategies, e.g., region then hash.

An example of vertical sharding for an e‑commerce system is given, showing how user and order tables can reside in separate databases. The corresponding ShardingSphere YAML configuration is provided:

schemaName: ecommerce

dataSources:
  user_db:
    url: jdbc:mysql://localhost:3306/user_db
    username: root
    password: password
  order_db:
    url: jdbc:mysql://localhost:3306/order_db
    username: root
    password: password

shardingRule:
  tables:
    users:
      actualDataNodes: user_db.users
    orders:
      actualDataNodes: order_db.orders

Java code for loading the configuration and querying a user record is also shown:

// Load vertical sharding configuration file
DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(new File("vertical-sharding.yml"));

// Query user information
String userSql = "SELECT * FROM users WHERE user_id = ?";
try (Connection conn = dataSource.getConnection();
     PreparedStatement pstmt = conn.prepareStatement(userSql)) {
    pstmt.setLong(1, 1L);
    try (ResultSet rs = pstmt.executeQuery()) {
        while (rs.next()) {
            System.out.println("User ID: " + rs.getLong("user_id") + ", Username: " + rs.getString("username"));
        }
    }
}

Key practical considerations include ensuring even data distribution, handling cross‑shard transactions (e.g., with Seata), and generating globally unique IDs (e.g., using Snowflake).

The article concludes that a well‑designed sharding strategy combined with appropriate middleware can significantly boost system performance and scalability.

Finally, the author promotes additional resources such as a 300k‑word architecture collection and a comprehensive Java interview Q&A set, inviting readers to obtain them via a WeChat public account.

distributed systemsperformancescalabilityDatabaseShardingShardingSphere
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.