Optimizing Bulk Data Import into MySQL with MyBatis: From Simple List Insertion to Multi‑Threaded Batch Processing
This article demonstrates how to dramatically speed up importing tens of thousands of records into MySQL by evolving a naïve list‑to‑MySQL approach into grouped batch inserts and finally a multi‑threaded MyBatis solution, while also addressing packet size limits and configuration tweaks.
In the beginning the author tried a straightforward method of inserting a list of objects directly into MySQL using MyBatis batch operations, which took more than a minute for 20,000 rows.
First, the simple list‑to‑MySQL code is shown:
@Transactional(rollbackFor = Exception.class)
public int addFreshStudentsNew2(List<FreshStudentAndStudentModel> list, String schoolNo) {
if (list == null || list.isEmpty()) {
return 0;
}
List<StudentEntity> studentEntityList = new LinkedList<>();
List<EnrollStudentEntity> enrollStudentEntityList = new LinkedList<>();
List<AllusersEntity> allusersEntityList = new LinkedList<>();
for (FreshStudentAndStudentModel model : list) {
// copy properties, set IDs, build entities
// ...
studentEntityList.add(studentEntity);
enrollStudentEntityList.add(enrollStudentEntity);
allusersEntityList.add(allusersEntity);
}
int enResult = enrollStudentDao.insertAll(enrollStudentEntityList);
int stuResult = studentDao.insertAll(studentEntityList);
boolean allResult = allusersFacade.insertUserList(allusersEntityList);
if (enResult > 0 && stuResult > 0 && allResult) {
return 10;
}
return -10;
}The corresponding Mapper.xml uses a foreach to build a massive INSERT statement, which can exceed MySQL's default max_allowed_packet (4 MB) and cause the error:
Packet for query is too large (6071393 > 4194304). You can change this value on the server by setting the max_allowed_packet variable.To avoid this, the author first groups the list into smaller batches (e.g., 100 records per batch) and inserts each batch sequentially:
@Transactional(rollbackFor = Exception.class)
public int addFreshStudentsNew2(List<FreshStudentAndStudentModel> list, String schoolNo) {
// ... same preparation as before ...
int batchSize = 100;
int batchCount = enrollStudentEntityList.size() / batchSize;
int remainder = enrollStudentEntityList.size() % batchSize;
for (int i = batchSize; i <= batchSize * batchCount; i += batchSize) {
enResult = enrollStudentDao.insertAll(enrollStudentEntityList.subList(i - batchSize, i));
stuResult = studentDao.insertAll(studentEntityList.subList(i - batchSize, i));
allResult = allusersFacade.insertUserList(allusersEntityList.subList(i - batchSize, i));
}
if (remainder != 0) {
// insert the remaining records
}
// return based on results
}While this prevents the packet‑size error, it introduces additional round‑trips and may still hit timeout limits. Therefore, the author further improves the process by employing multithreading.
In the multithreaded version, the list is divided among a fixed‑size thread pool (e.g., 50 threads). Each thread receives a sub‑list and performs the batch insert concurrently:
@Transactional(rollbackFor = Exception.class)
public int addFreshStudentsNew(List<FreshStudentAndStudentModel> list, String schoolNo) {
if (list == null || list.isEmpty()) return 0;
// build entity lists as before
int nThreads = 50;
int size = enrollStudentEntityList.size();
ExecutorService executor = Executors.newFixedThreadPool(nThreads);
List<Future<Integer>> futures = new ArrayList<>(nThreads);
for (int i = 0; i < nThreads; i++) {
final List<EnrollStudentEntity> partEnroll = enrollStudentEntityList.subList(size / nThreads * i, size / nThreads * (i + 1));
final List<StudentEntity> partStudent = studentEntityList.subList(size / nThreads * i, size / nThreads * (i + 1));
final List<AllusersEntity> partAllusers = allusersEntityList.subList(size / nThreads * i, size / nThreads * (i + 1));
Callable<Integer> task = () -> {
studentSave.saveStudent(partEnroll, partStudent, partAllusers);
return 1;
};
futures.add(executor.submit(task));
}
executor.shutdown();
if (!futures.isEmpty()) return 10;
return -10;
}This approach reduces database pressure and shortens total import time, provided the server has sufficient resources to handle the concurrent connections.
Finally, the author concludes that the progressive optimizations—from single‑list insertion to grouped batches and then to multithreaded processing—significantly improve import performance, turning a minute‑long operation into a sub‑10‑second task.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
