Design and Implementation of Table Sharding for Cash Repayment Applications Using ShardingSphere and Spring Boot
This article details the end‑to‑end design, configuration, and code implementation of a 50‑table sharding solution for cash repayment and loan application data, covering database schema planning, historical data migration, backend query adaptation, dynamic switches, scheduled consistency checks, and practical pitfalls encountered with Spring Boot 3 and ShardingSphere.
Introduction
After a year of writing business code, I received a technical request from leadership to split the loan and repayment application tables to improve query efficiency and reduce database pressure. The loan table contains tens of millions of rows, while the repayment table exceeds two hundred million, making sharding a viable solution.
Design Plan
Before coding, we need to align on several design aspects with leadership.
Basic Information of Sharded Database
Use a new instance with an independent data source.
No need for separate databases, only tables; 50 tables are sufficient.
Database name: tcbiz_ins_cash_mas_split (named by leadership).
Total number of tables: 50 .
Table naming pattern: CashRepayApplySplit0${0..9} , CashRepayApplySplit${10..49} (camel‑case naming).
Sharding column: memberId , sharding strategy: take the last two digits of memberId % 50 , pad with 0 if the remainder is less than 10 to match the two‑digit suffix.
Table structure identical to the original, including fields and indexes. Notably, the original uptTime column lacks an index, which must be added in the sharded tables.
Historical Data Synchronization
After creating fifty tables, both new business data and existing data must be written to the appropriate sharded tables according to the sharding strategy.
Previously, the repayment data migrated from CashRepayInfo to CashRepayApply . The current double‑write flow writes to CashRepayApply first, then synchronizes to CashRepayInfo to keep a complete historical copy.
We need to:
Synchronize the missing early CashRepayInfo data to the new sharded tables CashRepayApplySplit .
Synchronize the remaining CashRepayApply data to CashRepayApplySplit .
FAQ
Why not sync directly from CashRepayInfo to the sharded tables? Because the data volume exceeds two hundred million rows, making a full‑scale code‑driven sync impractical; the DBA must handle it, but schema differences between CashRepayInfo and CashRepayApply prevent a straightforward DBA‑only migration.
Specific Details
To locate the starting id for the migration, we find the intersection point of the vertical black dashed line with CashRepayInfo.id . This id marks the earliest data that must be copied to the sharded tables.
Using this id , we can locate the corresponding primary key in CashRepayApply and let the DBA synchronize data up to that point.
Backend Query Refactoring
The current admin system joins the single table directly. After sharding, joins are no longer possible, so we keep the recent two‑to‑three years of data in CashRepayApply and continue to query by memberId . Older data (>3 years) must be accessed via the sharded tables, requiring a new query page that mandates the memberId parameter.
External Department Notification
Once all data is fully synchronized to the sharded tables, the legacy CashRepayInfo will be gradually decommissioned. Other departments (risk control, big data, etc.) must be informed to switch their reports to query CashRepayApplySplit .
DBA‑Driven Synchronization of New Business Data
We first let developers sync a portion of data, then provide the DBA with a start id . The DBA syncs up to a deterministic end id based on uptTime < operationDate - 5 days . Any data generated during the DBA operation is later synced by developers at night (e.g., 23:00) after the system is closed for repayments.
Three‑Write Synchronization
During the transition period, we maintain three copies: write to CashRepayApply , then to CashRepayInfo , and finally to CashRepayApplySplit . Because the sharded database uses a different data source, a specific transaction manager must be declared:
@Transactional
(propagation = Propagation.REQUIRES_NEW, transactionManager = "transactionManagerSplit")We avoid using MySQL binlog listeners for real‑time sync due to DBA constraints.
Dynamic Switch for Data Sync
A dynamic switch controls whether new business data is written to the sharded tables. The migration proceeds in three phases: (1) developers sync historical data, (2) DBA syncs a bounded range, (3) developers sync the remaining new data before finally turning on the switch for seamless operation.
Scheduled Task Scanning
Since double‑write operations span different data sources, occasional failures may cause mismatches. A daily scheduled task compares today's data between CashRepayApply and CashRepayApplySplit , raising alerts if discrepancies are found.
Demo Journey
Our project uses an older ShardingSphere version, so we upgraded to a newer Spring Boot 3 compatible version. The official starter stopped updating after 5.3.0, so we integrated ShardingSphere via the driver.
Component Versions
SpringBoot 3.2.4 + JDK19 + MySQL8 + ShardingSphere 5.4.1 + MyBatisPlus 3.5.5
Maven Dependencies
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.19</version>
</dependency>
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc-core</artifactId>
<version>5.4.1</version>
</dependency>
<!--mybatisplus3-->
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-spring-boot3-starter</artifactId>
<version>3.5.5</version>
</dependency>
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
<version>2.3.3</version>
</dependency>application.yml
spring:
application:
name: sharding-jdbc-demo
datasource:
driver-class-name: org.apache.shardingsphere.driver.ShardingSphereDriver
url: jdbc:shardingsphere:classpath:sharding-config.yaml # sharding config file
profiles:
active: default
mybatis-plus:
configuration:
log-impl: org.apache.ibatis.logging.stdout.StdOutImplsharding-config.yaml
dataSources:
ds0:
dataSourceClassName: com.zaxxer.hikari.HikariDataSource
driverClassName: com.mysql.cj.jdbc.Driver
jdbcUrl: jdbc:mysql://localhost:3306/tcbiz_ins?characterEncoding=utf8&useAffectedRows=true&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
username: root
password: your_db_password
rules:
- !SHARDING
tables:
cash_repay_apply:
actualDataNodes: ds0.cash_repay_apply_0${0..9},ds0.cash_repay_apply_${10..49}
tableStrategy:
standard:
shardingColumn: member_id
shardingAlgorithmName: memberId-suffix-mode-50
shardingAlgorithms:
memberId-suffix-mode-50:
type: INLINE
props:
algorithm-expression: cash_repay_apply_${Long.parseLong(member_id) % 50}.padLeft(2,'0')
props:
sql-show: trueAfter configuration, we can access the database via a mapper:
@Mapper
public interface CashRepayApplyMapper extends BaseMapper
{
@Select("select * from cash_repay_apply where member_id = #{memberId}")
List
testSharding(@Param("memberId") String memberId);
}Running with Spring Boot 3.2.4 initially caused a NoSuchMethodError in org.yaml.snakeyaml.representer.Representer . The fix was to add a custom class with a no‑arg constructor to override the missing method.
Another issue was the inline sharding expression not matching the column type. Changing the expression to use Long.parseLong(member_id) % 50 resolved the error.
Demo Source Code Download
The complete demo is available on GitHub: https://github.com/yanzhisishui/sharding-jdbc-demo
Developer Data Synchronization Code
For the portion of data that developers must sync themselves, we use a single‑threaded batch process that processes 500 rows at a time:
/**
* Synchronization example code
*/
public void dataHandle(Long startId, Long endId) throws AppBizException {
log.info("CashRepayApplyServiceImpl#dataHandle start startId-{},endId-{}", startId, endId);
if (endId <= 0L) {
endId = null;
} else if (startId > endId) {
throw new AppBizException(ErrorCodeEnum.PARAM_CHECK_ERROR.getCode(), "Start ID cannot be greater than end ID");
}
QueryCashRepayInfoCondition condition = new QueryCashRepayInfoCondition();
condition.setGteId(startId);
condition.setLteId(endId);
condition.setOrders("id+"); // ascending order
List
infoList = cashRepayInfoDao.query(condition, 0, -1); // max 500 rows
long lastId;
while (CollectionUtil.isNotEmpty(infoList)) {
lastId = infoList.get(infoList.size() - 1).getId() + 1;
infoList.forEach(history -> {
try {
if (StringUtil.isBlank(history.getMemberId()) || StringUtil.isBlank(history.getRepayNo())) {
log.error("CashRepayApplyServiceImpl#dataHandle error memberId or repayNo is null id-{}", history.getId());
return;
}
QueryCashRepayApplySplitCond splitCond = new QueryCashRepayApplySplitCond();
splitCond.setMemberId(history.getMemberId());
splitCond.setRepayApplyNo(history.getRepayNo());
CashRepayApplySplit exist = cashRepayApplySplitDao.getUnique(splitCond);
CashRepayApplySplit splitData = buildCashRepayApplySplit(history);
if (exist == null) {
cashRepayApplySplitDao.add(splitData);
} else {
splitData.setId(exist.getId());
cashRepayApplySplitDao.update(splitData);
}
} catch (Exception e) {
log.error("CashRepayApplyServiceImpl#dataHandle error id-{},repayNo-{}", history.getId(), history.getRepayNo());
throw new RuntimeException(e);
}
});
LOG.info("dataHandle finish startId-{},endId-{}", condition.getGteId(), endId);
String redisCache = RedisCacheUtils.getRedisCache(CashApplyRedisKeyConsts.TERMINATE_SPLIT_SYNC_DATA);
if (StringUtil.isNotEmpty(redisCache)) {
LOG.info("CashRepayApplyServiceImpl#dataHandle need terminate loop startId-{}", condition.getGteId());
break;
}
condition.setGteId(lastId);
infoList = cashRepayInfoDao.query(condition, 0, -1);
}
}Helper method to build the sharded entity:
/**
* Convert CashRepayInfo to CashRepayApplySplit entity
*/
private CashRepayApplySplit buildCashRepayApplySplit(CashRepayInfo history) {
CashRepayApplySplit split = new CashRepayApplySplit();
CashRepayApply apply = cashRepayApplyDao.get(history.getRepayNo());
if (apply != null) {
BeanUtils.copyProperties(apply, split, "id");
return split;
}
// ... omitted: populate fields from CashRepayInfo
return split;
}Conclusion
Before joining the company, I often faced interview questions about database sharding without practical experience. After implementing this solution, I realized that sharding is not as daunting as it seems—mostly configuration and API calls. The real challenge lies in designing a migration plan that coordinates multiple teams, preserves existing functionality, and ensures data integrity at the company‑wide level.
Even though our sharding scenario is relatively simple, we now have hands‑on experience with billion‑row data partitioning.
References
[1] Is ShardingSphere 4.1.1 version compatible with Spring Boot 3.0.5 version? https://github.com/apache/shardingsphere/issues/27597
[2] Demo download: https://github.com/yanzhisishui/sharding-jdbc-demo
[3] Record of sending tens of millions of messages in a scheduled task: https://juejin.cn/post/7125399759589736479
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.