Databases 16 min read

Mastering Sharding-JDBC: A Deep Dive into Database Sharding Strategies

This article explains the motivations, scenarios, architecture, core components, performance characteristics, and future roadmap of Sharding-JDBC, a lightweight Java framework that enables transparent database sharding and scaling for high‑volume, high‑concurrency applications.

ITFLY8 Architecture Home

Jul 31, 2017

Mastering Sharding-JDBC: A Deep Dive into Database Sharding Strategies

Database sharding has been a hot topic since the early days of the Internet, and relational databases remain the primary choice for many companies due to their stability, flexible queries, and compatibility. Properly applying sharding techniques is essential for handling massive data and high concurrency.

Sharding Scenarios

Sharding addresses two common Internet challenges: large data volume and high concurrency, typically via vertical or horizontal splitting. Vertical splitting separates infrequently accessed fields into different tables, while horizontal splitting distributes rows across multiple databases or tables based on a sharding algorithm (e.g., modulo of an ID).

When a single table exceeds the performance threshold of a relational database, retrieval speed degrades sharply. Simple table sharding solves the data‑size problem but not the concurrency issue; therefore, horizontal sharding usually combines both database and table splitting.

Table sharding also helps avoid distributed transactions and eases operational management, making a combined "database + table" approach the best practice.

Sharding-JDBC Overview

Sharding-JDBC is a JDBC‑based sharding framework extracted from the ddframe application framework. It provides transparent access to sharded databases without requiring a proxy layer.

Key advantages:

Works with any Java ORM such as JPA, Hibernate, MyBatis, Spring JDBC Template, or plain JDBC.

Compatible with common connection pools (DBCP, C3P0, BoneCP, Druid, etc.).

Theoretically supports any JDBC‑compliant database; currently MySQL is supported with plans for Oracle and SQL Server.

Sharding-JDBC is a lightweight client‑side library delivered as a JAR, requiring no additional deployment, proxy, or DBA changes.

It offers flexible sharding strategies (equality, BETWEEN, IN) and multi‑key support, along with comprehensive SQL parsing for aggregation, grouping, ordering, LIMIT, OR, and binding tables.

Comparison with Other Open‑Source Projects

Unlike middle‑layer solutions such as Cobar, Sharding-JDBC connects directly to the database via JDBC, avoiding an extra network hop and offering a performance edge while still providing features like monitoring and connection management.

Client‑side solutions (Cobar‑Client, TDDL, Sharding-JDBC) share benefits of lightness, compatibility, and minimal DBA impact; however, Cobar‑Client’s ORM‑based implementation is less extensible than the pure JDBC approach used by Sharding-JDBC.

Implementation Principles

The core workflow of Sharding-JDBC follows these modules: sharding rule configuration, SQL parsing, SQL rewriting, SQL routing, SQL execution, and result merging.

Sharding Rule Configuration

Sharding-JDBC supports custom sharding strategies, multiple sharding keys, and complex operators. Examples include database sharding by user ID and table sharding by order ID, or yearly database sharding with monthly‑plus‑region table sharding.

Both equality and IN/BETWEEN operators are supported, and a Spring namespace simplifies configuration.

JDBC Specification Rewrite

Sharding-JDBC wraps the five core JDBC interfaces (DataSource, Connection, Statement, PreparedStatement, ResultSet) and manages multiple underlying JDBC implementations.

It strives to implement the full JDBC API, including batch updates, but some features (cursors, stored procedures, savepoints, forward‑only ResultSet navigation) remain unimplemented, and JDBC 4.1 interfaces are omitted for compatibility.

SQL Parsing

Sharding-JDBC uses Druid as its SQL parser, offering parsing speeds dozens of times faster than alternatives. It supports joins, aggregations, ORDER BY, GROUP BY, LIMIT, and OR queries, while lacking UNION, certain sub‑queries, and function‑based sharding.

SQL Rewriting

Rewriting replaces logical table names with actual physical tables and adjusts SQL constructs unsuitable for sharding. For example, AVG is rewritten to SUM/COUNT for correct distributed aggregation, and pagination is rewritten to fetch enough rows before merging.

SQL Routing

Routing directs SQL to the appropriate data source based on sharding rules. It includes single‑table routing, binding‑table routing (identical sharding logic across related tables), and Cartesian‑product routing for non‑binding joins.

SQL Execution

After routing, Sharding-JDBC executes SQL concurrently across shards and handles batch operations like addBatch.

Result Merging

Merging handles four result types: simple iteration, sorting, aggregation, and grouping. Simple results are concatenated, sorted results use merge‑sort, aggregations combine SUM/COUNT and recompute AVG, while grouping uses a map‑reduce approach and may require memory‑intensive processing.

Performance

Single‑database tests show Sharding-JDBC achieves 99.8% of JDBC TPS for queries, 90.2% for inserts, and 93.1% for updates, indicating minimal overhead.

Multi‑database tests demonstrate roughly 94% query, 60% insert, and 89% update TPS improvements over a single database, confirming the benefits of parallelism and distributed resources.

Roadmap

Read‑write separation

Flexible distributed transactions

Distributed primary‑key generation

SQL rewrite optimizations

SQL hints and small‑table broadcasting

High availability features

Traffic control

Database schema generation tools

Data migration utilities

Advanced SQL parsing (sub‑queries, stored procedures)

Support for Oracle and SQL Server

Configuration center

Open‑Source Philosophy

Many open‑source projects originate from internal use, leading to mature but sometimes incomplete releases. Challenges include limited post‑release support, tight coupling to original business scenarios, missing code, low community contribution, and fragmented forks.

Sharding-JDBC adopts a dual‑track strategy: simultaneous internal deployment and community release, ensuring continuous support, full source snapshots on GitHub, and an easy‑to‑understand codebase that encourages external contributions.

Current test coverage exceeds 90%; unsupported features are clearly documented, providing users with realistic expectations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance database sharding JDBC Sharding-JDBC

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.