Databases 16 min read

Mastering Sharding-JDBC: A Deep Dive into Database Sharding Strategies

This article explains the principles, architecture, and performance of Sharding-JDBC, covering sharding scenarios, rule configuration, JDBC rewriting, SQL parsing and routing, result merging, and future roadmap, while comparing it with other open‑source sharding solutions.

ITFLY8 Architecture Home

Nov 3, 2016

Mastering Sharding-JDBC: A Deep Dive into Database Sharding Strategies

Sharding Scenarios

Sharding is used to address two common internet scenarios: massive data volume and high concurrency. It can be performed vertically (splitting a database into multiple databases/tables based on business) or horizontally (splitting based on a sharding algorithm, such as routing rows to different databases by the remainder of an ID).

When relational databases exceed a certain data size, query performance degrades sharply. Pure table sharding solves the data‑size problem but not high‑concurrency access to a single database, so horizontal sharding usually combines database and table splitting. Table sharding remains useful for transaction isolation and reducing the number of instances for easier operations.

Introduction to Sharding-JDBC

Sharding-JDBC is a lightweight Java framework extracted from the ddframe application framework. It implements transparent horizontal sharding for relational databases by wrapping the JDBC API, requiring virtually no code migration.

Compatible with any Java ORM such as JPA, Hibernate, MyBatis, Spring JDBC Template, or direct JDBC.

Works with any third‑party connection pool (DBCP, C3P0, BoneCP, Druid, etc.).

Theoretically supports any JDBC‑compliant database; currently MySQL is supported with plans for Oracle and SQL Server.

It runs as a client‑side JAR without a proxy layer, needing no additional deployment or DBA changes.

Sharding strategies support equality, BETWEEN, IN, and multiple sharding keys. The SQL parser handles aggregation, grouping, ordering, LIMIT, OR, and binding tables.

Comparison with Other Open‑Source Products

Compared with middleware solutions like Cobar (proxy layer) and client‑side solutions such as Cobar‑Client, TDDL, and Sharding-JDBC, the client‑side approach offers lighter weight, better compatibility, and lower impact on DBAs, though middleware may provide richer monitoring and migration features.

Implementation Principles

The core workflow consists of sharding rule configuration, SQL parsing, SQL rewrite, SQL routing, SQL execution, and result merging.

Sharding Rule Configuration

Sharding-JDBC offers flexible rule definitions, supporting custom strategies, multiple sharding keys, and various operators. Examples include database sharding by user ID and table sharding by order ID, or year‑based database sharding with month‑and‑region table sharding.

JDBC Specification Rewrite

The framework wraps the five core JDBC interfaces (DataSource, Connection, Statement, PreparedStatement, ResultSet), managing multiple underlying JDBC implementations.

Most JDBC features are supported, but some less‑used APIs (cursors, stored procedures, savepoints, forward‑only ResultSet traversal) are not yet implemented, and JDBC 4.1 interfaces are omitted for compatibility.

SQL Parsing

Sharding-JDBC uses Druid as its SQL parser, offering significantly higher parsing speed than alternatives. It supports joins, aggregations, ORDER BY, GROUP BY, LIMIT, and OR queries, but not UNION, certain subqueries, or function‑level sharding.

SQL Rewrite

Rewrite replaces logical table names with physical ones and adjusts queries for sharding contexts. For example, AVG is rewritten to SUM/COUNT for correct aggregation across shards, and pagination limits are expanded to retrieve sufficient rows before final trimming.

SQL Routing

Based on sharding rules, SQL is routed to the appropriate data sources. Routing types include single‑table routing, binding‑table routing (identical sharding keys across related tables), and Cartesian‑product routing for non‑binding joins.

SQL Execution

After routing, Sharding-JDBC executes SQL concurrently across shards and handles batch operations like addBatch.

Result Merging

Merging handles four categories: simple iteration, sorting, aggregation, and grouping. Sorting uses merge‑sort; aggregation combines partial results (max/min, sum/count, avg via rewritten queries); grouping employs a map‑reduce style algorithm, which is memory‑intensive.

Performance

Single‑database tests show Sharding-JDBC achieves 99.8% of JDBC’s TPS for queries, 90.2% for inserts, and 93.1% for updates, indicating minimal overhead.

Multi‑database tests show roughly 94% query, 60% insert, and 89% update performance gains when using two databases versus one, demonstrating effective multi‑threaded and distributed execution.

Roadmap

Read‑write separation

Flexible distributed transactions

Distributed primary‑key generation

SQL rewrite optimization

SQL hints and small‑table broadcasting

High‑availability features

Traffic control

Database schema generation tools

Data migration utilities

Advanced SQL parsing (subqueries, stored procedures)

Support for Oracle and SQL Server

Configuration center

Open‑Source Philosophy

Many open‑source projects originate from internal use, leading to mature but sometimes poorly supported codebases. Sharding-JDBC aims for continuous community and internal support, full source snapshots on GitHub, and a simple core that encourages contributions while maintaining high test coverage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database sharding JDBC Horizontal Partitioning Sharding-JDBC

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.