Databases 21 min read

How Sharding-JDBC Transforms Relational Databases for Scalable Cloud‑Native Applications

This article explains the core functions of relational‑database middleware, dives into Sharding-JDBC’s architecture, performance characteristics, and implementation details such as sharding rules, SQL parsing, routing, rewrite, execution, result merging, and distributed primary‑key generation, and outlines its future roadmap.

dbaplus Community

Jul 27, 2017

How Sharding-JDBC Transforms Relational Databases for Scalable Cloud‑Native Applications

Core Functions of Relational Database Middleware

Relational databases are preferred for their flexible SQL, stable transaction engines, and mature tooling. In large‑scale Internet scenarios a single instance cannot handle massive data volume or request concurrency. Middleware that transparently transforms a monolithic database into a distributed one addresses two core problems: data‑volume limits and high‑traffic load.

Horizontal sharding (splitting a logical table into multiple physical tables based on a sharding algorithm) breaks the data‑volume bottleneck, while read/write splitting diverts traffic to replicas to alleviate request pressure. Combining both yields a balanced solution that preserves SQL compatibility and existing application code.

Sharding-JDBC Architecture and Kernel

Sharding-JDBC implements the JDBC interface, so migration cost for existing Java applications is near zero. It supports MySQL, PostgreSQL, Oracle and SQL Server out of the box and works with any ORM framework (JPA, Hibernate, MyBatis, Spring JDBC Template, etc.). The library runs inside the application process (lib‑level component) rather than as an external proxy.

The core logical flow consists of six modules:

Sharding rule configuration (programmatic, Inline expression, Spring namespace, YAML)

SQL parsing (custom parser since 1.5.x, previously Druid)

SQL routing (direct, simple, Cartesian‑product)

SQL rewrite (correctness and optimization)

SQL execution (multi‑threaded Statement/PreparedStatement handling)

Result merging (traversal, sorting, grouping, pagination; implemented as stream‑merge, memory‑merge, or decorator‑merge)

Performance tests under identical data volumes show:

Query TPS: Sharding‑JDBC ≈ 99.8% of native JDBC

Insert TPS: Sharding‑JDBC ≈ 90.2% of native JDBC

Update TPS: Sharding‑JDBC ≈ 93.1% of native JDBC

When a single table is split into two physical tables, horizontal scaling yields roughly 94% higher query TPS, 60% higher insert TPS, and 89% higher update TPS, demonstrating the benefit of sharding.

Detailed Feature Breakdown

Sharding Rule Configuration

Supports equality, BETWEEN and IN conditions, multi‑key sharding (e.g., user‑id for database, order‑id for table), and can be expressed via Inline expressions, Spring XML, or YAML for centralized management.

JDBC Specification Rewrite

Wraps DataSource, Connection, Statement, PreparedStatement and ResultSet, adding distributed primary‑key handling while preserving most JDBC semantics. Features not yet implemented include cursor, stored procedure, Savepoint, and updatable ResultSet.

SQL Parsing

Custom parser extracts a “sharding context” (selected items, table info, sharding conditions, primary‑key info, order/group/limit) without building a full AST, achieving high performance and tolerance to dialect differences.

SQL Routing

Three routing strategies:

Direct routing (Hint‑based, only database split) bypasses parsing.

Simple routing handles non‑JOIN or Binding‑table JOIN queries.

Cartesian‑product routing deals with complex non‑Binding joins at the cost of higher connection usage.

SQL Rewrite

Correctness rewrite replaces logical table names with physical ones, adjusts pagination, and adds missing columns. Optimization rewrite moves pagination before merging, skips unnecessary network traffic for single‑route cases, and rewrites GROUP BY‑only queries to GROUP BY + ORDER BY for stream merging.

SQL Execution

Uses three thread‑pools (Statement, PreparedStatement, Batch) managed by a ShardingContext whose lifecycle matches the ShardingDataSource.

Result Merging

Four merging types (traversal, sorting, grouping, pagination) can be combined. Implementations:

Stream‑merge: processes rows on the fly using cursors.

Memory‑merge: loads all rows into memory before merging.

Decorator‑merge: adds pagination or other cross‑cutting concerns on top of stream or memory merge.

Distributed Primary Key

Configurable per‑table strategy; default is Snowflake, which generates globally unique, roughly ordered IDs. The generated key is exposed via Statement.getGeneratedKeys(), appearing as an auto‑increment column to the application.

Future Outlook

Version milestones:

1.0.x – basic sharding

1.1.x – simplified configuration

1.2.x – flexible (soft) transactions

1.3.x – read/write splitting

1.4.x – distributed primary key

1.5.x – custom parser and multi‑DB support

Planned 1.6.x features include dynamic configuration stored in a service registry, full database governance (discovery, traffic steering, failover, circuit‑breaker), and tighter integration with cloud‑native micro‑service ecosystems.

Source code repository: https://github.com/dangdangdotcom/sharding-jdbc

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Sharding Performance Benchmark Database Middleware distributed transactions Sharding-JDBC SQL Routing

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.