Databases 55 min read

Sharding-JDBC Source Code Analysis and Custom Development

The article dissects Sharding‑JDBC’s five core engines—parsing, routing, rewriting, execution, and merging—highlights production pain points, and details custom extensions such as skipping sharding for specific tables, forcing master routing, runtime configuration refresh, batch‑update handling, sharding‑condition deduplication, full‑route validation, and a simplified component wrapper to ease integration and improve performance.

vivo Internet Technology

Mar 6, 2024

Sharding-JDBC Source Code Analysis and Custom Development

Author: vivo IT Platform Team - Xiong Huanxin

Sharding-JDBC is a JDBC‑layer database middleware widely used in sharding scenarios. This article parses the five core engines of Sharding-JDBC (parsing, routing, rewriting, execution, merging), shares pain points encountered in production, and presents custom development and refactoring solutions.

Business Background

Rapid growth of daily data in marketing inventory, transaction orders, finance ledgers, and attendance records creates pressure on single‑node databases. By distributing unrelated data across multiple databases and tables, system pressure is reduced and query performance improved.

Technical Selection

Sharding-JDBC was chosen over other middleware for its lightweight nature, better performance, and lower integration difficulty. However, several limitations were discovered during use.

1. Parsing Engine

The parsing engine tokenizes SQL into an abstract syntax tree (AST) which is the basis for routing and rewriting. Example of JDBC execution code:

<span>//获取数据库连接</span></code><code><span>try (Connection conn = DriverManager.getConnection("mysqlUrl", "userName", "password")) {</span></code><code><span>    String sql = "SELECT * FROM  t_user WHERE name = ?";</span></code><code><span>    //预编译SQL</span></code><code><span>    try (PreparedStatement preparedStatement = conn.prepareStatement(sql)) {</span></code><code><span>        preparedStatement.setString(1, "vivo");</span></code><code><span>        preparedStatement.execute(sql);</span></code><code><span>        try (ResultSet resultSet = preparedStatement.getResultSet()) {</span></code><code><span>            while (resultSet.next()) {</span></code><code><span>                //处理结果</span></code><code><span>            }</span></code><code><span>        }</span></code><code><span>    }</span></code><code><span>}</span>

The core parsing method:

<span>org.apache.shardingsphere.sql.parser.SQLParserEngine#parse0</span></code><code><span>private SQLStatement parse0(final String sql, final boolean useCache) {</span></code><code><span>    ParseTree parseTree = new SQLParserExecutor(databaseTypeName, sql).execute().getRootNode();</span></code><code><span>    SQLStatement result ;</span></code><code><span>    if(RuleContextManager.isSkipSharding() && !VisitorRule.SELECT.equals(VisitorRule.valueOf(parseTree.getClass()))){</span></code><code><span>        RuleContextManager.setMasterRoute(true);</span></code><code><span>        result = new SkipShardingStatement();</span></code><code><span>    } else {</span></code><code><span>        result = (SQLStatement) ParseTreeVisitorFactory.newInstance(databaseTypeName, VisitorRule.valueOf(parseTree.getClass())).visit(parseTree);</span></code><code><span>    }</span></code><code><span>    return result;</span></code><code><span>}</span>

2. Routing Engine

Routing determines which physical data nodes a SQL should be executed on. Example of route result creation:

<span>org.apache.shardingsphere.sharding.route.engine.ShardingRouteDecorator#decorate</span></code><code><span>private RouteContext decorate(final RouteContext routeContext, final ShardingSphereMetaData metaData, final ShardingRule shardingRule, final ConfigurationProperties properties) {</span></code><code><span>    ShardingConditions shardingConditions = getShardingConditions(parameters, sqlStatementContext, metaData.getSchema(), shardingRule);</span></code><code><span>    // ... routing logic ...</span></code><code><span>    return new RouteContext(sqlStatementContext, parameters, routeResult);</span></code><code><span>}</span>

Sharding and master‑slave routing are combined when needed.

3. Rewrite Engine

After routing, logical table names are rewritten to physical ones and the SQL is split for each target node.

<span>org.apache.shardingsphere.sharding.rewrite.SQLRewriteEntry#createSQLRewriteContext</span></code><code><span>public SQLRewriteContext createSQLRewriteContext(final String sql, final List<Object> parameters, final SQLStatementContext sqlStatementContext, final RouteContext routeContext) {</span></code><code><span>    SQLRewriteContext result = new SQLRewriteContext(schemaMetaData, sqlStatementContext, sql, parameters);</span></code><code><span>    result.generateSQLTokens();</span></code><code><span>    return result;</span></code><code><span>}</span>

The rewrite result is turned into ExecutionUnit objects:

<span>public final class ExecutionUnit {</span></code><code><span>    private final String dataSourceName;</span></code><code><span>    private final SQLUnit sqlUnit;</span></code><code><span>}</span>

4. Execution Engine

Execution groups SQL statements by data source and creates connections according to maxConnectionsSizePerQuery. Connection mode (STRICT or MEMORY) determines whether connections are created per SQL or shared.

<span>org.apache.shardingsphere.sharding.execute.sql.prepare.SQLExecutePrepareTemplate#getSQLExecuteGroups</span></code><code><span>private List<InputGroup<StatementExecuteUnit>> getSQLExecuteGroups(final String dataSourceName, final List<SQLUnit> sqlUnits, final SQLExecutePrepareCallback callback) throws SQLException {</span></code><code><span>    int desiredPartitionSize = Math.max(0 == sqlUnits.size() % maxConnectionsSizePerQuery ? sqlUnits.size() / maxConnectionsSizePerQuery : sqlUnits.size() / maxConnectionsSizePerQuery + 1, 1);</span></code><code><span>    List<List<SQLUnit>> sqlUnitPartitions = Lists.partition(sqlUnits, desiredPartitionSize);</span></code><code><span>    ConnectionMode connectionMode = maxConnectionsSizePerQuery < sqlUnits.size() ? ConnectionMode.CONNECTION_STRICTLY : ConnectionMode.MEMORY_STRICTLY;</span></code><code><span>    List<Connection> connections = callback.getConnections(connectionMode, dataSourceName, sqlUnitPartitions.size());</span></code><code><span>    // ... create StatementExecuteUnit for each partition ...</span></code><code><span>}</span>

Result merging is performed based on the SQL type.

5. Merge Engine

Merge combines result sets from multiple nodes. For SELECT statements, ShardingDQLResultMerger handles grouping, ordering, and pagination.

<span>org.apache.shardingsphere.sharding.merge.dql.ShardingDQLResultMerger#merge</span></code><code><span>public MergedResult merge(final List<QueryResult> queryResults, final SQLStatementContext sqlStatementContext, final SchemaMetaData schemaMetaData) throws SQLException {</span></code><code><span>    if (queryResults.size() == 1) {</span></code><code><span>        return new IteratorStreamMergedResult(queryResults);</span></code><code><span>    }</span></code><code><span>    // build group‑by, order‑by or iterator merged result</span></code><code><span>    MergedResult mergedResult = build(queryResults, selectStatementContext, columnLabelIndexMap, schemaMetaData);</span></code><code><span>    return decorate(queryResults, selectStatementContext, mergedResult);</span></code><code><span>}</span>

Custom Development Highlights

1) Skip Sharding Syntax Restrictions – For non‑sharding tables the parsing, routing, rewriting, and merging steps are bypassed. The decision is stored in RuleContextManager (ThreadLocal) and the execution path returns a SkipShardingStatement and a manually constructed ExecutionUnit.

<span>public final class RuleContextManager {</span></code><code><span>    private static final ThreadLocal<RuleContextManager> SKIP_CONTEXT_HOLDER = ThreadLocal.withInitial(RuleContextManager::new);</span></code><code><span>    private boolean skipSharding;</span></code><code><span>    private boolean masterRoute;</span></code><code><span>    public static boolean isSkipSharding() { return SKIP_CONTEXT_HOLDER.get().skipSharding; }</span></code><code><span>    public static void setSkipSharding(boolean skip) { SKIP_CONTEXT_HOLDER.get().skipSharding = skip; }</span></code><code><span>}</span>

2) Force Master Routing – A configuration property MASTER_ROUTE_ONLY sets MasterVisitedManager.setMasterVisited() so that MasterSlaveDataSourceRouter always routes to the master.

<span>org.apache.shardingsphere.masterslave.route.engine.MasterSlaveRouteDecorator#decorate</span></code><code><span>if (properties.<Boolean>getValue(ConfigurationPropertyKey.MASTER_ROUTE_ONLY)) {</span></code><code><span>    MasterVisitedManager.setMasterVisited();</span></code><code><span>}</span>

3) Dynamic Configuration Refresh – Added TypedProperties.refreshValue(String key, String value) to update configuration at runtime, exposed through the datasource’s runtime context.

<span>public boolean refreshValue(String key, String value) {</span></code><code><span>    for (E each : enumConstants) {</span></code><code><span>        if (each.getKey().equals(key)) {</span></code><code><span>            TypedPropertyValue typedPropertyValue = new TypedPropertyValue(each, value);</span></code><code><span>            cache.put(each, typedPropertyValue);</span></code><code><span>            props.put(key, value);</span></code><code><span>            return true;</span></code><code><span>        }</span></code><code><span>    }</span></code><code><span>    return false;</span></code><code><span>}</span>

4) Batch UPDATE Support – The original prepareBatch method was extended to split multi‑statement batches, route each statement individually, and keep an EXECUTION_UNIT_LIST in ExecutionContext to preserve all units.

<span>private ExecutionContext prepareBatch(List<String> splitSqlList, final List<Object> allParameters) {</span></code><code><span>    List<String> sqlList = splitSqlList.stream().distinct().collect(Collectors.toList());</span></code><code><span>    String sql = sqlList.get(0);</span></code><code><span>    Collection<ExecutionUnit> globalExecutionUnitList = new ArrayList<>();</span></code><code><span>    // route each parameter set</span></code><code><span>    for (List<Object> eachSqlParameterList : eachSqlParameterListList) {</span></code><code><span>        RouteContext routeContext = executeRoute(sql, eachSqlParameterList);</span></code><code><span>        globalExecutionUnitList.addAll(executeRewrite(sql, eachSqlParameterList, routeContext));</span></code><code><span>    }</span></code><code><span>    executionContextResult.getExtendMap().put(EXECUTION_UNIT_LIST, globalExecutionUnitList);</span></code><code><span>    return executionContextResult;</span></code><code><span>}</span>

5) ShardingCondition Deduplication – Added @EqualsAndHashCode to ListRouteValue and RangeRouteValue, then deduplicated the list of ShardingCondition objects.

<span>private Collection<ShardingCondition> createShardingConditions(...){</span></code><code><span>    // ... build conditions ...</span></code><code><span>    Collection<ShardingCondition> distinctResult = result.stream().distinct().collect(Collectors.toCollection(LinkedList::new));</span></code><code><span>    return distinctResult;</span></code><code><span>}</span>

6) Full‑Route Validation – When ALLOW_EMPTY_SHARDING_CONDITIONS is false, the router throws an exception if a DML statement lacks sharding keys.

<span>if (!properties.<Boolean>getValue(ConfigurationPropertyKey.ALLOW_EMPTY_SHARDING_CONDITIONS)) {</span></code><code><span>    if (sqlStatementContext.getSqlStatement() instanceof DMLStatement) {</span></code><code><span>        if (shardingConditions.getConditions().isEmpty()) {</span></code><code><span>            throw new ShardingSphereException("SQL does not contain sharding key");</span></code><code><span>        }</span></code><code><span>    }</span></code><span>}</span>

7) Component Packaging – A wrapper component simplifies datasource configuration, sharding rule definition, and provides built‑in listeners for dynamic properties.

Usage Recommendations

• Always include sharding keys in SQL to avoid full routing. • Avoid routing a single SQL to multiple databases. • Do not apply functions or arithmetic on sharding keys. • Do not use sub‑queries on sharding tables. • CASE, HAVING, UNION(ALL) on sharding tables limit routing to a single node.

Additional best‑practice suggestions include using globally unique IDs, aligning GROUP BY and ORDER BY columns, and using incremental IDs for efficient pagination.

Conclusion

The article provides a detailed walkthrough of Sharding‑JDBC’s core engines, custom extensions to bypass syntax restrictions, force master routing, enable dynamic configuration, support batch updates, and improve performance. The customizations aim to lower integration effort and reduce the impact on existing SQL while maintaining the robustness of Sharding‑JDBC.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

source-code-analysis Database Middleware MERGE engine SQL Rewrite Execution Engine custom development Routing Engine Sharding-JDBC

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.