Databases 25 min read

Design and Implementation of Table Sharding for Loan and Repayment Applications Using ShardingSphere and Spring Boot

The article describes a year‑long development experience where a senior architect designs, plans, and implements a sharding solution for loan and repayment tables using ShardingSphere, Spring Boot 3, MySQL, and custom synchronization code to migrate billions of rows while ensuring data consistency and operational continuity.

Top Architect

Oct 4, 2024

Design and Implementation of Table Sharding for Loan and Repayment Applications Using ShardingSphere and Spring Boot

Preface

Since joining the company I have been writing business code for a year, and suddenly received a technical request from leadership to split the loan and repayment application tables. The loan table contains only tens of millions of rows, while the repayment table already has over two hundred million rows, so sharding is needed to improve query efficiency and reduce database pressure. This is my first non‑business requirement after a year of work.

Design Plan

Before coding we need a system design covering several points to align with leadership.

Sharding Database Basic Information

Use a new instance for the sharding database, independent data source.

No need for separate databases, only tables. 50 tables are sufficient.

Sharding database name: tcbiz_ins_cash_mas_split (named by leadership, no special meaning).

Total number of tables: 50.

Table naming pattern: CashRepayApplySplit0${0..9}, CashRepayApplySplit${10..49} (the company uses camel‑case for table and column names).

Sharding column: memberId. Sharding strategy: take the last two digits of memberId % 50, pad with 0 when the remainder is less than 10 because the table suffix is two digits.

Table structure is identical to the original table, including fields and indexes. Notably, the original uptTime column has no index, so we need to add one in the sharding tables while keeping the main table unchanged.

Historical Data Synchronization

After creating fifty tables, new business data must be written according to the sharding strategy, and existing data must be migrated to the sharded tables using the same strategy for tables CashRepayApplySplit0${0..9} and CashRepayApplySplit${10..49}.

Previously the company performed a migration from CashRepayInfo (original table) to CashRepayApply (current main table). The current system writes to CashRepayApply and synchronizes to CashRepayInfo to keep a full copy of data. The goal now is to synchronize the full data from CashRepayInfo to the sharded tables.

Steps:

Synchronize the missing early data from CashRepayInfo to CashRepayApplySplit.

Synchronize the remaining data from CashRepayApply to CashRepayApplySplit.

Q&A

Why not sync directly from CashRepayInfo to the sharded tables? Because the full data volume exceeds two hundred million rows, which cannot be fully processed by developers; the DBA must handle it. However, the DBA requires identical field structures between source and target tables, and CashRepayInfo and CashRepayApply have differences, so a hybrid approach is needed.

Specific Details

To find the starting id for migration, we locate the intersection point of the black dotted line with CashRepayInfo.id. All data before that id must be migrated to the sharded tables.

Existing Backend Query Refactor

The current admin/operations backend uses single‑table join queries. After sharding, the original query pattern will no longer work. The plan is to keep the last two‑to‑three years of data in the original CashRepayApply table (still queryable by memberId), while older data will be queried from the sharded tables using memberId as a mandatory condition.

External Department Notification

After full data migration, the old CashRepayInfo table will be gradually retired. Other departments (risk control, big data, etc.) must be informed to switch their reports to the new sharded tables.

DBA Synchronization Process for New Business Data

The first step (developer‑driven) synchronizes a portion of data. Then a start id is given to the DBA to sync the remaining data. Because new business data keeps being inserted, the DBA cannot know the exact end id. The solution is to limit the DBA sync to records with uptTime older than operatorDate - 5 days, providing a deterministic end id. Any data generated during the DBA operation will be synced later by developers (e.g., at 23:00 when the repayment service is closed).

Three‑Write Synchronization

During the transition period, all three tables ( CashRepayInfo, CashRepayApply, CashRepayApplySplit) must stay consistent. The write order is: first write to CashRepayApply, then to CashRepayInfo, and finally to CashRepayApplySplit. Because the sharding database is a separate data source, a dedicated transaction manager must be declared for the sharding writes.

@Transactional(propagation = Propagation.REQUIRES_NEW, transactionManager = "transactionManagerSplit")

We avoid using MySQL binlog listeners for real‑time sync because the DBA indicated it is difficult to implement.

Dynamic Switch for Synchronization

A runtime switch is needed to enable or disable the double‑write from CashRepayApply to the sharded tables. The switch will be turned on after the historical data migration is complete.

Timed Task Scanning

Since the double‑write occurs in separate transactions, there is a risk that a write succeeds in the main table but fails in the sharded table. A daily scheduled task scans today’s repayment applications, compares CashRepayApply and CashRepayApplySplit, and raises alerts if inconsistencies are found.

Demo Road

The project uses a relatively low version of ShardingSphere, so we upgraded to a newer version compatible with Spring Boot 3. The official sharding‑jdbc‑spring‑boot‑starter stopped updating after version 5.3.0; we therefore use version 5.3.0 or later with the ShardingSphereDriver.

Component Versions

SpringBoot 3.2.4 + JDK19 + MySQL8 + ShardingSphere 5.4.1 + MyBatisPlus 3.5.5

Maven Dependencies

<dependency>
  <groupId>mysql</groupId>
  <artifactId>mysql-connector-java</artifactId>
  <version>8.0.19</version>
</dependency>

<dependency>
  <groupId>org.apache.shardingsphere</groupId>
  <artifactId>shardingsphere-jdbc-core</artifactId>
  <version>5.4.1</version>
</dependency>

<!-- MyBatis‑Plus 3 -->
<dependency>
  <groupId>com.baomidou</groupId>
  <artifactId>mybatis-plus-spring-boot3-starter</artifactId>
  <version>3.5.5</version>
</dependency>

<!-- JAXB needed for Java 19 -->
<dependency>
  <groupId>com.sun.xml.bind</groupId>
  <artifactId>jaxb-impl</artifactId>
  <version>2.3.3</version>
</dependency>

application.yml

spring:
  application:
    name: sharding-jdbc-demo
  datasource:
    driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
    url: jdbc:shardingsphere:classpath:sharding-config.yaml # sharding config file
  profiles:
    active: default

mybatis-plus:
  configuration:
    log-impl: org.apache.ibatis.logging.stdout.StdOutImpl

sharding-config.yaml

dataSources:
  ds0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/tcbiz_ins?characterEncoding=utf8&useAffectedRows=true&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    username: root
    password: your_password

rules:
  - !SHARDING
    tables:
      cash_repay_apply:
        actualDataNodes: ds0.cash_repay_apply_0${0..9},ds0.cash_repay_apply_${10..49}
        tableStrategy:
          standard:
            shardingColumn: member_id
            shardingAlgorithmName: memberId-suffix-mode-50
    shardingAlgorithms:
      memberId-suffix-mode-50:
        type: INLINE
        props:
          algorithm-expression: cash_repay_apply_${Long.parseLong(member_id) % 50}.padLeft(2,'0')

props:
  sql-show: true

Mapper Example

@Mapper
public interface CashRepayApplyMapper extends BaseMapper<CashRepayApply> {
    @Select("select * from cash_repay_apply where member_id = #{memberId}")
    List<CashRepayApply> testSharding(@Param("memberId") String memberId);
}

Calling this method lets ShardingSphere automatically route to the correct sharded table.

Pitfalls

Spring Boot 3.2.4 requires the JAXB dependency.

The class org.yaml.snakeyaml.representer.Representer lost its no‑arg constructor; we added a wrapper class with a default constructor to fix the NoSuchMethodError.

Groovy inline sharding expressions must cast the sharding column to Long before applying the modulo operation, otherwise a MissingMethodException occurs.

Data Synchronization Code

Below is a simplified single‑thread batch synchronization that processes 500 rows per iteration, checks for termination flags in Redis, and writes to the sharded table with proper transaction handling.

/**
  * Synchronization example code
  */
public void dataHandle(Long startId, Long endId) throws AppBizException {
    log.info("CashRepayApplyServiceImpl#dataHandle start startId-{},endId-{}", startId, endId);
    if (endId <= 0L) {
        endId = null;
    } else if (startId > endId) {
        throw new AppBizException(ErrorCodeEnum.PARAM_CHECK_ERROR.getCode(), "Start ID cannot be greater than end ID");
    }
    QueryCashRepayInfoCondition condition = new QueryCashRepayInfoCondition();
    condition.setGteId(startId);
    condition.setLteId(endId);
    condition.setOrders("id+"); // ascending order
    List<CashRepayInfo> infoList = cashRepayInfoDao.query(condition, 0, -1); // max 500 rows per query
    long lastId;
    while (CollectionUtil.isNotEmpty(infoList)) {
        lastId = infoList.get(infoList.size() - 1).getId() + 1;
        infoList.forEach(history -> {
            try {
                if (StringUtil.isBlank(history.getMemberId()) || StringUtil.isBlank(history.getRepayNo())) {
                    log.error("CashRepayApplyServiceImpl#dataHandle error memberId or repayNo is null id-{}", history.getId());
                    return;
                }
                QueryCashRepayApplySplitCond splitCond = new QueryCashRepayApplySplitCond();
                splitCond.setMemberId(history.getMemberId());
                splitCond.setRepayApplyNo(history.getRepayNo());
                CashRepayApplySplit exist = cashRepayApplySplitDao.getUnique(splitCond);
                CashRepayApplySplit splitData = buildCashRepayApplySplit(history);
                if (exist == null) {
                    cashRepayApplySplitDao.add(splitData);
                } else {
                    splitData.setId(exist.getId());
                    cashRepayApplySplitDao.update(splitData);
                }
            } catch (Exception e) {
                log.error("CashRepayApplyServiceImpl#dataHandle error id-{},repayNo-{}", history.getId(), history.getRepayNo());
                throw new RuntimeException(e);
            }
        });
        LOG.info("dataHandle finish startId-{},endId-{}", condition.getGteId(), endId);
        String redisCache = RedisCacheUtils.getRedisCache(CashApplyRedisKeyConsts.TERMINATE_SPLIT_SYNC_DATA);
        if (StringUtil.isNotEmpty(redisCache)) {
            LOG.info("CashRepayApplyServiceImpl#dataHandle need terminate loop startId-{}", condition.getGteId());
            break;
        }
        condition.setGteId(lastId);
        infoList = cashRepayInfoDao.query(condition, 0, -1);
    }
}

private CashRepayApplySplit buildCashRepayApplySplit(CashRepayInfo history) {
    CashRepayApplySplit split = new CashRepayApplySplit();
    CashRepayApply apply = cashRepayApplyDao.get(history.getRepayNo());
    if (apply != null) {
        BeanUtils.copyProperties(apply, split, "id");
        return split;
    }
    // ... omitted: map fields from CashRepayInfo to split entity ...
    return split;
}

Conclusion

The author reflects that many interviewers ask about sharding without practical experience, but after actually implementing the solution, the process turned out to be straightforward: read the documentation, configure the rules, and let the framework handle routing. Nevertheless, the real challenge lies in designing the overall migration plan, coordinating multiple departments, and ensuring that the sharding does not disrupt existing services.

In the end, the team gained valuable experience with billion‑row level sharding, which is a noteworthy achievement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java MySQL ShardingSphere database sharding SpringBoot

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.