Design and Implementation of Table Sharding for Cash Repayment System Using ShardingSphere and SpringBoot
The article describes how a one‑year‑old loan/repayment service was refactored by introducing a 50‑table sharding scheme with ShardingSphere, detailing the design decisions, historical data migration, code changes, configuration files, pitfalls, and runtime synchronization strategies to achieve efficient query performance on billions of rows.
Preface
Since joining the company I have been writing business code for a year, and suddenly received a technical requirement from leadership to split the loan and repayment application tables. The loan table contains tens of millions of rows, the repayment table over two hundred million, and splitting can improve query efficiency and reduce database pressure. This is my first non‑business requirement after a year.
Design Plan
Before coding we need system design covering the following items to align with leadership.
Sharding Database Basic Info
Sharding database uses a new instance, independent data source.
No need to shard databases, only tables; 50 tables are sufficient.
Sharding database name tcbiz_ins_cash_mas_split (named by leadership).
Total number of tables 50 .
Table name pattern CashRepayApplySplit0${0..9} , CashRepayApplySplit${10..49} (camel‑case).
Sharding column memberId ; strategy: take the last two digits of memberId % 50 and pad left with 0 because the table suffix is two digits.
Table structure identical to the original, including fields and indexes. Notably the original uptTime column has no index, which must be added in the sharding tables.
Historical Data Synchronization
After creating fifty tables, new business data must be routed according to the sharding strategy, and existing data must be migrated to the sharding tables CashRepayApplySplit0${0..9} , CashRepayApplySplit${10..49} .
Previously we migrated data from the original CashRepayInfo (old table) to CashRepayApply (current main table). The current system writes to CashRepayApply and synchronizes to CashRepayInfo so that CashRepayInfo holds the complete data set.
Therefore the migration steps are:
Synchronize the missing early data from CashRepayInfo to CashRepayApplySplit .
Synchronize the remaining data from CashRepayApply to CashRepayApplySplit .
Dynamic Switch for Data Synchronization
A dynamic switch is needed to control whether new business data from CashRepayApply is written to the sharding tables. The switch will be turned on after the historical data migration is complete.
Existing Backend Query Refactoring
The current admin/operations backend queries the single table with JOIN . After sharding, the original query cannot be used directly, so the plan is to keep the original table CashRepayApply for the most recent two‑three years of data and continue using memberId joins, while older data will be queried from the sharding tables using memberId as a mandatory condition.
External Department Notification
After the full data migration, the oldest table CashRepayInfo will be gradually deprecated. Other departments (risk control, big data, etc.) must be informed to switch their reports to the new sharding tables CashRepayApplySplit .
DBA‑Assisted Synchronization Process
Because the data volume exceeds two hundred million rows, DBA‑assisted bulk migration is required. The DBA needs a start id and an end id to migrate. The end id is determined by filtering rows whose uptTime is earlier than operationDate - 5 days , ensuring a deterministic range. Any data generated during the DBA run will be synchronized later by the development team.
Three‑Write Synchronization
During the transition period, three tables ( CashRepayInfo , CashRepayApply , CashRepayApplySplit ) must stay consistent. Writes first go to CashRepayApply , then to CashRepayInfo , and finally to CashRepayApplySplit . A dedicated transaction manager is required for the sharding datasource.
@Transactional(propagation = Propagation.REQUIRES_NEW, transactionManager = "transactionManagerSplit")Real‑time synchronization is performed in code rather than using middleware such as MySQL binlog, because DBA indicated that binlog‑based sync is not feasible in our environment.
Dynamic Switch Implementation
The switch controls whether new data from CashRepayApply is also written to the sharding tables. The switch will be opened after the historical migration and DBA bulk load are finished.
Scheduled Scanning Task
A daily scheduled task scans today’s newly created repayment records, compares the main table and the sharding tables, and raises an alert if any discrepancy is found.
Difficult Demo Journey
Because the project uses an old version of ShardingSphere, the demo is built with SpringBoot 3.2.4, MySQL 8, and ShardingSphere 5.4.1. The official starter for ShardingSphere stopped updating after version 5.3.0, so we use the driver‑based integration.
Component Versions
SpringBoot 3.2.4 + JDK19 + MySQL8 + ShardingSphere 5.4.1 + MyBatisPlus 3.5.5
Maven Dependencies
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.19</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc-core</artifactId>
<version>5.4.1</version>
</dependency>
<!-- mybatis‑plus 3 -->
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-spring-boot3-starter</artifactId>
<version>3.5.5</version>
</dependency>
<!-- required for JDK19 -->
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
<version>2.3.3</version>
</dependency>application.yml
spring:
application:
name: sharding-jdbc-demo
datasource:
driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
url: jdbc:shardingsphere:classpath:sharding-config.yaml # sharding config file
profiles:
active: default
mybatis-plus:
configuration:
log-impl: org.apache.ibatis.logging.stdout.StdOutImplsharding-config.yaml
dataSources:
ds0:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.cj.jdbc.Driver
jdbcUrl: jdbc:mysql://localhost:3306/tcbiz_ins?characterEncoding=utf8&useAffectedRows=true&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
username: root
password: your_db_password
rules:
- !SHARDING
tables:
cash_repay_apply:
actualDataNodes: ds0.cash_repay_apply_0$->{0..9},ds0.cash_repay_apply_$->{10..49}
tableStrategy:
standard:
shardingColumn: member_id
shardingAlgorithmName: memberId-suffix-mode-50
shardingAlgorithms:
memberId-suffix-mode-50:
type: INLINE
props:
algorithm-expression: cash_repay_apply_$->{Long.parseLong(member_id) % 50}.padLeft(2,'0')
props:
sql-show: trueMapper Example
@Mapper
public interface CashRepayApplyMapper extends BaseMapper
{
@Select("select * from cash_repay_apply where member_id = #{memberId}")
List
testSharding(@Param("memberId") String memberId);
}Calling the mapper method lets ShardingSphere automatically route to the correct sharding table.
Runtime Pitfalls
Running with SpringBoot 3.2.4 throws java.lang.NoSuchMethodError: org.yaml.snakeyaml.representer.Representer "void ()" not found due to a missing no‑arg constructor. The quick fix is to copy the class into the project and add an empty constructor:
public Representer() {
super(new DumperOptions());
this.representers.put(null, new RepresentJavaBean());
}After the fix, the sharding based on member_id works correctly.
More Pitfalls
SpringBoot 3.2.4 requires the JAXB implementation dependency.
The original Representer class lacks a no‑arg constructor, causing the above error.
Groovy inline sharding expression member_id % 50 fails because Groovy treats the operand as a String. The solution is to cast to Long : algorithm-expression: cash_repay_apply_$->{Long.parseLong(member_id) % 50} Since member_id is a VARCHAR, we also pad the result to two digits.
Demo Source Code Download
The demo is available on GitHub – click the link to download.
Data Synchronization Code
The following method shows a single‑threaded batch synchronization that processes 500 rows at a time until completion.
/**
* Synchronize data example
*/
public void dataHandle(Long startId, Long endId) throws AppBizException {
log.info("CashRepayApplyServiceImpl#dataHandle start startId-{},endId-{}", startId, endId);
if (endId <= 0L) {
endId = null;
} else if (startId > endId) {
throw new AppBizException(ErrorCodeEnum.PARAM_CHECK_ERROR.getCode(), "Start ID cannot be greater than end ID");
}
QueryCashRepayInfoCondition condition = new QueryCashRepayInfoCondition();
condition.setGteId(startId);
condition.setLteId(endId);
condition.setOrders("id+"); // ascending order
List
infoList = cashRepayInfoDao.query(condition, 0, -1); // max 500 rows
long lastId;
while (CollectionUtil.isNotEmpty(infoList)) {
lastId = infoList.get(infoList.size() - 1).getId() + 1;
infoList.forEach(history -> {
try {
if (StringUtil.isBlank(history.getMemberId()) || StringUtil.isBlank(history.getRepayNo())) {
log.error("CashRepayApplyServiceImpl#dataHandle error memberId or repayNo is null id-{}", history.getId());
return;
}
QueryCashRepayApplySplitCond splitCond = new QueryCashRepayApplySplitCond();
splitCond.setMemberId(history.getMemberId());
splitCond.setRepayApplyNo(history.getRepayNo());
CashRepayApplySplit exist = cashRepayApplySplitDao.getUnique(splitCond);
CashRepayApplySplit splitData = buildCashRepayApplySplit(history);
if (exist == null) {
cashRepayApplySplitDao.add(splitData);
} else {
splitData.setId(exist.getId());
cashRepayApplySplitDao.update(splitData);
}
} catch (Exception e) {
log.error("CashRepayApplyServiceImpl#dataHandle error id-{},repayNo-{}", history.getId(), history.getRepayNo());
throw new RuntimeException(e);
}
});
LOG.info("dataHandle finish startId-{},endId-{}", condition.getGteId(), endId);
String redisCache = RedisCacheUtils.getRedisCache(CashApplyRedisKeyConsts.TERMINATE_SPLIT_SYNC_DATA);
if (StringUtil.isNotEmpty(redisCache)) {
LOG.info("CashRepayApplyServiceImpl#dataHandle need terminate loop startId-{}", condition.getGteId());
break;
}
condition.setGteId(lastId);
infoList = cashRepayInfoDao.query(condition, 0, -1);
}
}
/**
* Convert CashRepayInfo to CashRepayApplySplit entity
*/
private CashRepayApplySplit buildCashRepayApplySplit(CashRepayInfo history) {
CashRepayApplySplit split = new CashRepayApplySplit();
CashRepayApply apply = cashRepayApplyDao.get(history.getRepayNo());
if (apply != null) {
BeanUtils.copyProperties(apply, split, "id");
return split;
}
// ... omitted: populate fields from CashRepayInfo
return split;
}Conclusion
Before this project I often faced interview questions about sharding without practical experience. After going through the whole process—design, stakeholder alignment, DBA coordination, code implementation, and handling numerous pitfalls—I realized that sharding is less about exotic technology and more about systematic design and cross‑team collaboration.
Even though the business requirement was relatively simple, we now have real‑world experience with billion‑row table sharding.
For more articles, follow the author and join the architecture community.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.