How Sharding-JDBC Transforms Relational Databases for Scalable Cloud‑Native Applications
This article explains the core functions of relational‑database middleware, dives into Sharding-JDBC’s architecture, performance characteristics, and implementation details such as sharding rules, SQL parsing, routing, rewrite, execution, result merging, and distributed primary‑key generation, and outlines its future roadmap.
Core Functions of Relational Database Middleware
Relational databases are preferred for their flexible SQL, stable transaction engines, and mature tooling. In large‑scale Internet scenarios a single instance cannot handle massive data volume or request concurrency. Middleware that transparently transforms a monolithic database into a distributed one addresses two core problems: data‑volume limits and high‑traffic load.
Horizontal sharding (splitting a logical table into multiple physical tables based on a sharding algorithm) breaks the data‑volume bottleneck, while read/write splitting diverts traffic to replicas to alleviate request pressure. Combining both yields a balanced solution that preserves SQL compatibility and existing application code.
Sharding-JDBC Architecture and Kernel
Sharding-JDBC implements the JDBC interface, so migration cost for existing Java applications is near zero. It supports MySQL, PostgreSQL, Oracle and SQL Server out of the box and works with any ORM framework (JPA, Hibernate, MyBatis, Spring JDBC Template, etc.). The library runs inside the application process (lib‑level component) rather than as an external proxy.
The core logical flow consists of six modules:
Sharding rule configuration (programmatic, Inline expression, Spring namespace, YAML)
SQL parsing (custom parser since 1.5.x, previously Druid)
SQL routing (direct, simple, Cartesian‑product)
SQL rewrite (correctness and optimization)
SQL execution (multi‑threaded Statement/PreparedStatement handling)
Result merging (traversal, sorting, grouping, pagination; implemented as stream‑merge, memory‑merge, or decorator‑merge)
Performance tests under identical data volumes show:
Query TPS: Sharding‑JDBC ≈ 99.8% of native JDBC
Insert TPS: Sharding‑JDBC ≈ 90.2% of native JDBC
Update TPS: Sharding‑JDBC ≈ 93.1% of native JDBC
When a single table is split into two physical tables, horizontal scaling yields roughly 94% higher query TPS, 60% higher insert TPS, and 89% higher update TPS, demonstrating the benefit of sharding.
Detailed Feature Breakdown
Sharding Rule Configuration
Supports equality, BETWEEN and IN conditions, multi‑key sharding (e.g., user‑id for database, order‑id for table), and can be expressed via Inline expressions, Spring XML, or YAML for centralized management.
JDBC Specification Rewrite
Wraps DataSource, Connection, Statement, PreparedStatement and ResultSet, adding distributed primary‑key handling while preserving most JDBC semantics. Features not yet implemented include cursor, stored procedure, Savepoint, and updatable ResultSet.
SQL Parsing
Custom parser extracts a “sharding context” (selected items, table info, sharding conditions, primary‑key info, order/group/limit) without building a full AST, achieving high performance and tolerance to dialect differences.
SQL Routing
Three routing strategies:
Direct routing (Hint‑based, only database split) bypasses parsing.
Simple routing handles non‑JOIN or Binding‑table JOIN queries.
Cartesian‑product routing deals with complex non‑Binding joins at the cost of higher connection usage.
SQL Rewrite
Correctness rewrite replaces logical table names with physical ones, adjusts pagination, and adds missing columns. Optimization rewrite moves pagination before merging, skips unnecessary network traffic for single‑route cases, and rewrites GROUP BY‑only queries to GROUP BY + ORDER BY for stream merging.
SQL Execution
Uses three thread‑pools (Statement, PreparedStatement, Batch) managed by a ShardingContext whose lifecycle matches the ShardingDataSource.
Result Merging
Four merging types (traversal, sorting, grouping, pagination) can be combined. Implementations:
Stream‑merge: processes rows on the fly using cursors.
Memory‑merge: loads all rows into memory before merging.
Decorator‑merge: adds pagination or other cross‑cutting concerns on top of stream or memory merge.
Distributed Primary Key
Configurable per‑table strategy; default is Snowflake, which generates globally unique, roughly ordered IDs. The generated key is exposed via Statement.getGeneratedKeys(), appearing as an auto‑increment column to the application.
Future Outlook
Version milestones:
1.0.x – basic sharding
1.1.x – simplified configuration
1.2.x – flexible (soft) transactions
1.3.x – read/write splitting
1.4.x – distributed primary key
1.5.x – custom parser and multi‑DB support
Planned 1.6.x features include dynamic configuration stored in a service registry, full database governance (discovery, traffic steering, failover, circuit‑breaker), and tighter integration with cloud‑native micro‑service ecosystems.
Source code repository: https://github.com/dangdangdotcom/sharding-jdbc
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
