Databases 70 min read

Sharding-JDBC: Basic Usage and Core Principles

This article introduces Sharding-JDBC, explaining its basic usage, core concepts, sharding strategies, SQL processing workflow, configuration examples, and code snippets, while emphasizing the importance of understanding its underlying principles for effective database horizontal partitioning in Java applications.

Full-Stack Internet Architecture

Nov 13, 2020

Sharding-JDBC: Basic Usage and Core Principles

Introduction

This article aims to teach developers how to use Sharding-JDBC, a lightweight Java JDBC middleware that provides transparent horizontal sharding for relational databases. It stresses that mastering the underlying principles is more valuable than merely configuring the framework.

What Is Sharding-JDBC?

Sharding-JDBC (Apache ShardingSphere JDBC) is a Java library that sits on the JDBC layer and adds sharding, distributed transaction, and governance capabilities while remaining fully compatible with JDBC and popular ORM frameworks such as MyBatis, JPA, and Spring JDBC Template.

It works with any JDBC‑compatible database (MySQL, Oracle, SQLServer, PostgreSQL, etc.) and can be used with any connection pool (HikariCP, Druid, etc.). The project also includes two sibling components: ShardingSphere‑Proxy (a protocol‑level proxy) and ShardingSphere‑Sidecar (a Kubernetes‑native sidecar).

Sharding Concepts

Sharding splits a logical table into multiple physical tables (and possibly multiple databases). Two main dimensions exist:

Vertical sharding : separates tables by business domain (e.g., user table vs. order table).

Horizontal sharding : distributes rows of a single table across many physical tables based on a sharding key.

Key terms include logical table, physical table, sharding key, routing, sharding strategy, sharding algorithm, and binding tables.

Sharding Strategies

Sharding‑JDBC provides several built‑in strategies, each pairing a sharding key with an algorithm:

Standard strategy (precise and range algorithms)

Complex strategy (multiple keys)

Hint strategy (values supplied programmatically)

Inline strategy (Groovy expressions)

None strategy (no sharding)

Configuration Example (Java API)

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core</artifactId>
    <version>${latest.release.version}</version>
</dependency>

Below is a simplified Java configuration that defines two data sources (ds0 and ds1) and a sharding rule for the t_order table.

Map<String, DataSource> dataSourceMap = new HashMap<>();
BasicDataSource dataSource1 = new BasicDataSource();
dataSource1.setDriverClassName("com.mysql.jdbc.Driver");
dataSource1.setUrl("jdbc:mysql://localhost:3306/ds0");
// ... set username/password
dataSourceMap.put("ds0", dataSource1);

BasicDataSource dataSource2 = new BasicDataSource();
// ... similar configuration for ds1

device ShardingTableRuleConfiguration orderTableRuleConfig =
    new ShardingTableRuleConfiguration("t_order", "ds${0..1}.t_order${0..1}");
orderTableRuleConfig.setDatabaseShardingStrategy(
    new StandardShardingStrategyConfiguration("user_id", "dbShardingAlgorithm"));
orderTableRuleConfig.setTableShardingStrategy(
    new StandardShardingStrategyConfiguration("order_id", "tableShardingAlgorithm"));

ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();
shardingRuleConfig.getTables().add(orderTableRuleConfig);
Properties dbProps = new Properties();
dbProps.setProperty("algorithm-expression", "ds${user_id % 2}");
shardingRuleConfig.getShardingAlgorithms().put("dbShardingAlgorithm",
    new ShardingSphereAlgorithmConfiguration("INLINE", dbProps));
// ... similar for table algorithm

Configuration Example (Properties File)

# Real data source names
spring.shardingsphere.datasource.names=ds0,ds1
# ds0 configuration
spring.shardingsphere.datasource.ds0.type=org.apache.commons.dbcp2.BasicDataSource
spring.shardingsphere.datasource.ds0.driver-class-name=com.mysql.jdbc.Driver
spring.shardingsphere.datasource.ds0.url=jdbc:mysql://localhost:3306/ds0
spring.shardingsphere.datasource.ds0.username=root
spring.shardingsphere.datasource.ds0.password=
# ds1 configuration (similar)
# Table rule
spring.shardingsphere.rules.sharding.tables.t_order.actual-data-nodes=ds$->{0..1}.t_order$->{0..1}
# Database sharding strategy
spring.shardingsphere.rules.sharding.tables.t_order.database-strategy.standard.sharding-column=user_id
spring.shardingsphere.rules.sharding.tables.t_order.database-strategy.standard.sharding-algorithm-name=database_inline
# Table sharding strategy
spring.shardingsphere.rules.sharding.tables.t_order.table-strategy.standard.sharding-column=order_id
spring.shardingsphere.rules.sharding.tables.t_order.table-strategy.standard.sharding-algorithm-name=table_inline
# Inline algorithms
spring.shardingsphere.rules.sharding.sharding-algorithms.database_inline.type=INLINE
spring.shardingsphere.rules.sharding.sharding-algorithms.database_inline.props.algorithm-expression=ds_${user_id % 2}
spring.shardingsphere.rules.sharding.sharding-algorithms.table_inline.type=INLINE
spring.shardingsphere.rules.sharding.sharding-algorithms.table_inline.props.algorithm-expression=t_order_${order_id % 2}

SQL Processing Pipeline

When an application submits a logical SQL statement, Sharding‑JDBC performs five steps:

SQL parsing : lexical analysis creates tokens; syntactic analysis builds an abstract syntax tree (AST) and extracts context such as tables, columns, conditions, pagination, etc.

Routing : the parsed context is matched against the configured sharding strategies to determine which physical tables/databases the query should hit. Routes can be single‑shard, multi‑shard, or full‑shard.

SQL rewriting : logical table names are replaced with physical table names, identifiers (indexes, schemas) are rewritten, and additional columns may be added for ORDER BY, GROUP BY, or auto‑increment handling.

Execution : an execution engine decides how many JDBC connections to open per query (controlled by maxConnectionSizePerQuery) and whether to use memory‑strict or connection‑strict mode. It then runs the physical SQLs, possibly in parallel.

Result merging : results from all shards are merged. Types of merging include traversal (simple concatenation), sorting (priority‑queue based), grouping (streaming or in‑memory), aggregation (SUM, COUNT, AVG, MIN, MAX), and pagination (decorator pattern).

Key Sub‑steps

Identifier rewriting ensures only table names are changed, not aliases or string literals. Index and schema names are also rewritten when required.

Column supplementation adds missing columns for ORDER BY or GROUP BY, and replaces auto‑increment primary keys with distributed IDs (e.g., Snowflake).

Pagination correction rewrites LIMIT offset, size to fetch enough rows from each shard before applying the final offset on the merged result set.

Batch splitting rewrites multi‑row INSERT statements so that each shard receives only the rows that belong to it, avoiding duplicate data.

Merging Strategies

Depending on the routing result and the configured maxConnectionSizePerQuery, Sharding‑JDBC chooses between:

Stream merging : keeps a cursor on each shard and merges rows on‑the‑fly (used when each connection handles a single shard).

Memory merging : loads all rows into memory before merging (used when a connection must handle multiple shards).

Decorator merging : applies pagination or aggregation on top of the basic merge.

Conclusion

Sharding‑JDBC abstracts the complexity of horizontal database partitioning, allowing developers to write simple logical SQL while the middleware handles parsing, routing, rewriting, execution, and result merging. Understanding its core principles—sharding keys, strategies, algorithms, and the five‑step processing pipeline—is essential for building scalable Java applications that operate on massive data sets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java database sharding Horizontal Partitioning Sharding-JDBC SQL Routing

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.