Backend Development 6 min read

Improving Million-Scale Data Insertion Efficiency with Multithreaded Batch Processing in Spring Boot

This article demonstrates how to boost the insertion speed of over two million records by using Spring Boot with MyBatis‑Plus, a ThreadPoolTaskExecutor for multithreaded batch inserts, and detailed performance testing that shows a reduction from 5.75 minutes to 1.67 minutes.

Architecture Digest
Architecture Digest
Architecture Digest
Improving Million-Scale Data Insertion Efficiency with Multithreaded Batch Processing in Spring Boot

Development purpose: Increase the efficiency of inserting data at a million‑scale level.

Solution adopted: Use multithreaded batch insertion via ThreadPoolTaskExecutor .

Technology stack:

Spring Boot 2.1.1

MyBatis‑Plus 3.0.6

Swagger 2.5.0

Lombok 1.18.4

PostgreSQL

ThreadPoolTaskExecutor

Thread pool configuration (application‑dev.properties):

# Asynchronous thread configuration
# Core thread count
async.executor.thread.core_pool_size = 30
# Maximum thread count
async.executor.thread.max_pool_size = 30
# Queue capacity
async.executor.thread.queue_capacity = 99988
# Thread name prefix
async.executor.thread.name.prefix = async-importDB-

Spring bean for the thread pool:

@Configuration
@EnableAsync
@Slf4j
public class ExecutorConfig {
    @Value("${async.executor.thread.core_pool_size}")
    private int corePoolSize;
    @Value("${async.executor.thread.max_pool_size}")
    private int maxPoolSize;
    @Value("${async.executor.thread.queue_capacity}")
    private int queueCapacity;
    @Value("${async.executor.thread.name.prefix}")
    private String namePrefix;

    @Bean(name = "asyncServiceExecutor")
    public Executor asyncServiceExecutor() {
        log.warn("start asyncServiceExecutor");
        ThreadPoolTaskExecutor executor = new VisiableThreadPoolTaskExecutor();
        executor.setCorePoolSize(corePoolSize);
        executor.setMaxPoolSize(maxPoolSize);
        executor.setQueueCapacity(queueCapacity);
        executor.setThreadNamePrefix(namePrefix);
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        executor.initialize();
        return executor;
    }
}

Asynchronous service implementation:

@Service
@Slf4j
public class AsyncServiceImpl implements AsyncService {
    @Override
    @Async("asyncServiceExecutor")
    public void executeAsync(List
logOutputResults, LogOutputResultMapper logOutputResultMapper, CountDownLatch countDownLatch) {
        try {
            log.warn("start executeAsync");
            // Business logic for async thread
            logOutputResultMapper.addLogOutputResultBatch(logOutputResults);
            log.warn("end executeAsync");
        } finally {
            countDownLatch.countDown(); // Ensure latch is released even on exception
        }
    }
}

Multithreaded batch insertion method:

@Override
public int testMultiThread() {
    List
logOutputResults = getTestData();
    // Split data into sub‑lists of 100 records each
    List
> lists = ConvertHandler.splitList(logOutputResults, 100);
    CountDownLatch countDownLatch = new CountDownLatch(lists.size());
    for (List
listSub : lists) {
        asyncService.executeAsync(listSub, logOutputResultMapper, countDownLatch);
    }
    try {
        countDownLatch.await(); // Wait for all threads to finish
    } catch (Exception e) {
        log.error("Blocking exception:" + e.getMessage());
    }
    return logOutputResults.size();
}

The test inserted 2,000,003 records using 30 concurrent threads, completing in 1.67 minutes, whereas a single‑threaded run took 5.75 minutes. Additional experiments with varying thread counts showed that more threads do not always mean better performance; a practical rule of thumb is CPU cores × 2 + 2 threads.

Data integrity checks confirmed no duplicate IDs and full record insertion after the multithreaded run.

Conclusion: Leveraging a properly sized ThreadPoolTaskExecutor in a Spring Boot application can dramatically reduce bulk insertion time for massive datasets, but the optimal thread count depends on the hardware and should be tuned empirically.

performance testingSpring BootmultithreadingPostgreSQLbatch insertThreadPoolTaskExecutor
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.