Databases 23 min read

Design and Implementation of Table Sharding for Cash Repayment System Using ShardingSphere and SpringBoot

The article describes how a one‑year‑old loan/repayment service was refactored by introducing a 50‑table sharding scheme with ShardingSphere, detailing the design decisions, historical data migration, code changes, configuration files, pitfalls, and runtime synchronization strategies to achieve efficient query performance on billions of rows.

Architect

Jul 17, 2024

Design and Implementation of Table Sharding for Cash Repayment System Using ShardingSphere and SpringBoot

Preface

Since joining the company I have been writing business code for a year, and suddenly received a technical requirement from leadership to split the loan and repayment application tables. The loan table contains tens of millions of rows, the repayment table over two hundred million, and splitting can improve query efficiency and reduce database pressure. This is my first non‑business requirement after a year.

Design Plan

Before coding we need system design covering the following items to align with leadership.

Sharding Database Basic Info

Sharding database uses a new instance, independent data source.

No need to shard databases, only tables; 50 tables are sufficient.

Sharding database name tcbiz_ins_cash_mas_split (named by leadership).

Total number of tables 50.

Table name pattern CashRepayApplySplit0${0..9}, CashRepayApplySplit${10..49} (camel‑case).

Sharding column memberId; strategy: take the last two digits of memberId % 50 and pad left with 0 because the table suffix is two digits.

Table structure identical to the original, including fields and indexes. Notably the original uptTime column has no index, which must be added in the sharding tables.

Historical Data Synchronization

After creating fifty tables, new business data must be routed according to the sharding strategy, and existing data must be migrated to the sharding tables CashRepayApplySplit0${0..9}, CashRepayApplySplit${10..49}.

Previously we migrated data from the original CashRepayInfo (old table) to CashRepayApply (current main table). The current system writes to CashRepayApply and synchronizes to CashRepayInfo so that CashRepayInfo holds the complete data set.

Therefore the migration steps are:

Synchronize the missing early data from CashRepayInfo to CashRepayApplySplit.

Synchronize the remaining data from CashRepayApply to CashRepayApplySplit.

Dynamic Switch for Data Synchronization

A dynamic switch is needed to control whether new business data from CashRepayApply is written to the sharding tables. The switch will be turned on after the historical data migration is complete.

Existing Backend Query Refactoring

The current admin/operations backend queries the single table with JOIN. After sharding, the original query cannot be used directly, so the plan is to keep the original table CashRepayApply for the most recent two‑three years of data and continue using memberId joins, while older data will be queried from the sharding tables using memberId as a mandatory condition.

External Department Notification

After the full data migration, the oldest table CashRepayInfo will be gradually deprecated. Other departments (risk control, big data, etc.) must be informed to switch their reports to the new sharding tables CashRepayApplySplit.

DBA‑Assisted Synchronization Process

Because the data volume exceeds two hundred million rows, DBA‑assisted bulk migration is required. The DBA needs a start id and an end id to migrate. The end id is determined by filtering rows whose uptTime is earlier than operationDate - 5 days, ensuring a deterministic range. Any data generated during the DBA run will be synchronized later by the development team.

Three‑Write Synchronization

During the transition period, three tables ( CashRepayInfo, CashRepayApply, CashRepayApplySplit) must stay consistent. Writes first go to CashRepayApply, then to CashRepayInfo, and finally to CashRepayApplySplit. A dedicated transaction manager is required for the sharding datasource.

@Transactional(propagation = Propagation.REQUIRES_NEW, transactionManager = "transactionManagerSplit")

Real‑time synchronization is performed in code rather than using middleware such as MySQL binlog, because DBA indicated that binlog‑based sync is not feasible in our environment.

Dynamic Switch Implementation

The switch controls whether new data from CashRepayApply is also written to the sharding tables. The switch will be opened after the historical migration and DBA bulk load are finished.

Scheduled Scanning Task

A daily scheduled task scans today’s newly created repayment records, compares the main table and the sharding tables, and raises an alert if any discrepancy is found.

Difficult Demo Journey

Because the project uses an old version of ShardingSphere, the demo is built with SpringBoot 3.2.4, MySQL 8, and ShardingSphere 5.4.1. The official starter for ShardingSphere stopped updating after version 5.3.0, so we use the driver‑based integration.

Component Versions

SpringBoot 3.2.4 + JDK19 + MySQL8 + ShardingSphere 5.4.1 + MyBatisPlus 3.5.5

Maven Dependencies

<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.19</version>
</dependency>

<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core</artifactId>
    <version>5.4.1</version>
</dependency>

<!-- mybatis‑plus 3 -->
<dependency>
    <groupId>com.baomidou</groupId>
    <artifactId>mybatis-plus-spring-boot3-starter</artifactId>
    <version>3.5.5</version>
</dependency>

<!-- required for JDK19 -->
<dependency>
    <groupId>com.sun.xml.bind</groupId>
    <artifactId>jaxb-impl</artifactId>
    <version>2.3.3</version>
</dependency>

application.yml

spring:
  application:
    name: sharding-jdbc-demo
  datasource:
    driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
    url: jdbc:shardingsphere:classpath:sharding-config.yaml # sharding config file
  profiles:
    active: default

mybatis-plus:
  configuration:
    log-impl: org.apache.ibatis.logging.stdout.StdOutImpl

sharding-config.yaml

dataSources:
  ds0:
    dataSourceClassName: com.zaxxer.hikari.HikariDataSource
    driverClassName: com.mysql.cj.jdbc.Driver
    jdbcUrl: jdbc:mysql://localhost:3306/tcbiz_ins?characterEncoding=utf8&useAffectedRows=true&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
    username: root
    password: your_db_password

rules:
  - !SHARDING
    tables:
      cash_repay_apply:
        actualDataNodes: ds0.cash_repay_apply_0$->{0..9},ds0.cash_repay_apply_$->{10..49}
        tableStrategy:
          standard:
            shardingColumn: member_id
            shardingAlgorithmName: memberId-suffix-mode-50
    shardingAlgorithms:
      memberId-suffix-mode-50:
        type: INLINE
        props:
          algorithm-expression: cash_repay_apply_$->{Long.parseLong(member_id) % 50}.padLeft(2,'0')

props:
  sql-show: true

Mapper Example

@Mapper
public interface CashRepayApplyMapper extends BaseMapper<CashRepayApply> {
    @Select("select * from cash_repay_apply where member_id = #{memberId}")
    List<CashRepayApply> testSharding(@Param("memberId") String memberId);
}

Calling the mapper method lets ShardingSphere automatically route to the correct sharding table.

Runtime Pitfalls

Running with SpringBoot 3.2.4 throws

java.lang.NoSuchMethodError: org.yaml.snakeyaml.representer.Representer "void <init>()" not found

due to a missing no‑arg constructor. The quick fix is to copy the class into the project and add an empty constructor:

public Representer() {
    super(new DumperOptions());
    this.representers.put(null, new RepresentJavaBean());
}

After the fix, the sharding based on member_id works correctly.

More Pitfalls

SpringBoot 3.2.4 requires the JAXB implementation dependency.

The original Representer class lacks a no‑arg constructor, causing the above error.

Groovy inline sharding expression member_id % 50 fails because Groovy treats the operand as a String. The solution is to cast to Long:

algorithm-expression: cash_repay_apply_$->{Long.parseLong(member_id) % 50}

Since member_id is a VARCHAR, we also pad the result to two digits.

Demo Source Code Download

The demo is available on GitHub – click the link to download.

Data Synchronization Code

The following method shows a single‑threaded batch synchronization that processes 500 rows at a time until completion.

/**
 * Synchronize data example
 */
public void dataHandle(Long startId, Long endId) throws AppBizException {
    log.info("CashRepayApplyServiceImpl#dataHandle start startId-{},endId-{}", startId, endId);
    if (endId <= 0L) {
        endId = null;
    } else if (startId > endId) {
        throw new AppBizException(ErrorCodeEnum.PARAM_CHECK_ERROR.getCode(), "Start ID cannot be greater than end ID");
    }
    QueryCashRepayInfoCondition condition = new QueryCashRepayInfoCondition();
    condition.setGteId(startId);
    condition.setLteId(endId);
    condition.setOrders("id+"); // ascending order
    List<CashRepayInfo> infoList = cashRepayInfoDao.query(condition, 0, -1); // max 500 rows
    long lastId;
    while (CollectionUtil.isNotEmpty(infoList)) {
        lastId = infoList.get(infoList.size() - 1).getId() + 1;
        infoList.forEach(history -> {
            try {
                if (StringUtil.isBlank(history.getMemberId()) || StringUtil.isBlank(history.getRepayNo())) {
                    log.error("CashRepayApplyServiceImpl#dataHandle error memberId or repayNo is null id-{}", history.getId());
                    return;
                }
                QueryCashRepayApplySplitCond splitCond = new QueryCashRepayApplySplitCond();
                splitCond.setMemberId(history.getMemberId());
                splitCond.setRepayApplyNo(history.getRepayNo());
                CashRepayApplySplit exist = cashRepayApplySplitDao.getUnique(splitCond);
                CashRepayApplySplit splitData = buildCashRepayApplySplit(history);
                if (exist == null) {
                    cashRepayApplySplitDao.add(splitData);
                } else {
                    splitData.setId(exist.getId());
                    cashRepayApplySplitDao.update(splitData);
                }
            } catch (Exception e) {
                log.error("CashRepayApplyServiceImpl#dataHandle error id-{},repayNo-{}", history.getId(), history.getRepayNo());
                throw new RuntimeException(e);
            }
        });
        LOG.info("dataHandle finish startId-{},endId-{}", condition.getGteId(), endId);
        String redisCache = RedisCacheUtils.getRedisCache(CashApplyRedisKeyConsts.TERMINATE_SPLIT_SYNC_DATA);
        if (StringUtil.isNotEmpty(redisCache)) {
            LOG.info("CashRepayApplyServiceImpl#dataHandle need terminate loop startId-{}", condition.getGteId());
            break;
        }
        condition.setGteId(lastId);
        infoList = cashRepayInfoDao.query(condition, 0, -1);
    }
}

/**
 * Convert CashRepayInfo to CashRepayApplySplit entity
 */
private CashRepayApplySplit buildCashRepayApplySplit(CashRepayInfo history) {
    CashRepayApplySplit split = new CashRepayApplySplit();
    CashRepayApply apply = cashRepayApplyDao.get(history.getRepayNo());
    if (apply != null) {
        BeanUtils.copyProperties(apply, split, "id");
        return split;
    }
    // ... omitted: populate fields from CashRepayInfo
    return split;
}

Conclusion

Before this project I often faced interview questions about sharding without practical experience. After going through the whole process—design, stakeholder alignment, DBA coordination, code implementation, and handling numerous pitfalls—I realized that sharding is less about exotic technology and more about systematic design and cross‑team collaboration.

Even though the business requirement was relatively simple, we now have real‑world experience with billion‑row table sharding.

For more articles, follow the author and join the architecture community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Backend Development Sharding mysql ShardingSphere database partitioning

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.