Backend Development 8 min read

Batch Processing with Multithreading in Java: Splitting Large Collections and Using Thread Pools

This article explains how to efficiently handle massive data batch updates in Java by splitting large collections into smaller chunks, processing them concurrently with a configurable ThreadPoolExecutor, and controlling execution order, while providing reusable utility code and practical implementation examples.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Batch Processing with Multithreading in Java: Splitting Large Collections and Using Thread Pools

When developers need to perform batch operations on massive data sets, efficiency becomes a critical concern; using multithreading to process the data in parallel is a straightforward yet powerful solution.

The overall workflow involves three main steps: (1) split the large collection into N smaller sub‑collections, (2) configure and start a thread pool to process each sub‑collection, and (3) control the execution order of the threads.

Below is a reusable utility class SplitListUtils that leverages Guava and Apache Commons to divide a list into sub‑lists of a specified size.

import com.google.common.collect.Lists;
import org.apache.commons.collections.CollectionUtils;
import java.util.List;
/**
* 拆分结合工具类
*
* @author shiwen
* @date 2020/12/27
*/
public class SplitListUtils {
/**
* 拆分集合
*
* @param
泛型对象
* @param resList 需要拆分的集合
* @param subListLength 每个子集合的元素个数
* @return 返回拆分后的各个集合组成的列表
* @throws 代码里面用到了guava和common的结合工具类
*/
public static
List
> split(List
resList, int subListLength) {
if (CollectionUtils.isEmpty(resList) || subListLength <= 0) {
return Lists.newArrayList();
}
List
> ret = Lists.newArrayList();
int size = resList.size();
if (size <= subListLength) {
// 数据量不足 subListLength 指定的大小
ret.add(resList);
} else {
int pre = size / subListLength;
int last = size % subListLength;
// 前面 pre 个集合,每个大小都是 subListLength 个元素
for (int i = 0; i < pre; i++) {
List
itemList = Lists.newArrayList();
for (int j = 0; j < subListLength; j++) {
itemList.add(resList.get(i * subListLength + j));
}
ret.add(itemList);
}
// last 的进行处理
if (last > 0) {
List
itemList = Lists.newArrayList();
for (int i = 0; i < last; i++) {
itemList.add(resList.get(pre * subListLength + i));
}
ret.add(itemList);
}
}
return ret;
}
// 运行代码示例
public static void main(String[] args) {
List
list = Lists.newArrayList();
int size = 1099;
for (int i = 0; i < size; i++) {
list.add("hello-" + i);
}
List
> temps = split(list, 100);
int j = 0;
for (List
obj : temps) {
System.out.println(String.format("row:%s -> size:%s,data:%s", ++j, obj.size(), obj));
}
}
}

To execute the batch updates asynchronously, a ThreadPoolExecutor is created with tuned parameters, the large collection is split, and each sub‑list is submitted to the pool; a CountDownLatch ensures the main thread waits for all tasks to finish before performing the final batch insert.

public void threadMethod() {
List
updateList = new ArrayList();
// 初始化线程池, 参数一定要调好!!!!
ThreadPoolExecutor threadPool = new ThreadPoolExecutor(20, 50,
4, TimeUnit.SECONDS, new ArrayBlockingQueue(10), new ThreadPoolExecutor.AbortPolicy());
// 大集合拆分成 N 个小集合, 这里集合的 size 可以稍微小一些(这里我用 100 刚刚好)
List
splitNList = SplitListUtils.split(totalList, 100);
// 记录单个任务的执行次数
CountDownLatch countDownLatch = new CountDownLatch(splitNList.size());
// 对拆分的集合进行批量处理, 先拆分的集合, 再多线程执行
for (List
singleList : splitNList) {
// 线程池执行
threadPool.execute(new Thread(new Runnable(){
@Override
public void run() {
for (Entity yangshiwen : singleList) {
// 将每一个对象进行数据封装, 并添加到一个用于存储更新数据的 list
// ......
}
}
}));
// 任务个数 - 1, 直至为 0 时唤醒 await()
countDownLatch.countDown();
}
try {
// 让当前线程处于阻塞状态,直到锁存器计数为零
countDownLatch.await();
} catch (InterruptedException e) {
throw new BusinessLogException(ResponseEnum.FAIL);
}
// 通过 MyBatis 的批量插入方式来进行数据的插入, 这一步还是要做判空
if (GeneralUtil.listNotNull(updateList)) {
batchUpdateEntity(updateList);
LogUtil.info("xxxxxxxxxxxxxxx");
}
}

Finally, the article reminds readers that multithreading in Java is challenging but enjoyable, encouraging them to practice and share the knowledge.

JavaConcurrencyGuavaMultithreadingThreadPoolExecutorBatchProcessingApacheCommons
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.