Backend Development 6 min read

Improving Million-Scale Data Insertion Efficiency with Multithreaded Batch Processing in Spring Boot

This article demonstrates how to boost the insertion speed of over two million records by using Spring Boot with MyBatis‑Plus, a ThreadPoolTaskExecutor for multithreaded batch inserts, and detailed performance testing that shows a reduction from 5.75 minutes to 1.67 minutes.

Architecture Digest

Jul 24, 2024

Improving Million-Scale Data Insertion Efficiency with Multithreaded Batch Processing in Spring Boot

Development purpose: Increase the efficiency of inserting data at a million‑scale level.

Solution adopted: Use multithreaded batch insertion via ThreadPoolTaskExecutor.

Technology stack:

Spring Boot 2.1.1

MyBatis‑Plus 3.0.6

Swagger 2.5.0

Lombok 1.18.4

PostgreSQL

ThreadPoolTaskExecutor

Thread pool configuration (application‑dev.properties):

# Asynchronous thread configuration
# Core thread count
async.executor.thread.core_pool_size = 30
# Maximum thread count
async.executor.thread.max_pool_size = 30
# Queue capacity
async.executor.thread.queue_capacity = 99988
# Thread name prefix
async.executor.thread.name.prefix = async-importDB-

Spring bean for the thread pool:

@Configuration
@EnableAsync
@Slf4j
public class ExecutorConfig {
    @Value("${async.executor.thread.core_pool_size}")
    private int corePoolSize;
    @Value("${async.executor.thread.max_pool_size}")
    private int maxPoolSize;
    @Value("${async.executor.thread.queue_capacity}")
    private int queueCapacity;
    @Value("${async.executor.thread.name.prefix}")
    private String namePrefix;

    @Bean(name = "asyncServiceExecutor")
    public Executor asyncServiceExecutor() {
        log.warn("start asyncServiceExecutor");
        ThreadPoolTaskExecutor executor = new VisiableThreadPoolTaskExecutor();
        executor.setCorePoolSize(corePoolSize);
        executor.setMaxPoolSize(maxPoolSize);
        executor.setQueueCapacity(queueCapacity);
        executor.setThreadNamePrefix(namePrefix);
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        executor.initialize();
        return executor;
    }
}

Asynchronous service implementation:

@Service
@Slf4j
public class AsyncServiceImpl implements AsyncService {
    @Override
    @Async("asyncServiceExecutor")
    public void executeAsync(List<LogOutputResult> logOutputResults, LogOutputResultMapper logOutputResultMapper, CountDownLatch countDownLatch) {
        try {
            log.warn("start executeAsync");
            // Business logic for async thread
            logOutputResultMapper.addLogOutputResultBatch(logOutputResults);
            log.warn("end executeAsync");
        } finally {
            countDownLatch.countDown(); // Ensure latch is released even on exception
        }
    }
}

Multithreaded batch insertion method:

@Override
public int testMultiThread() {
    List<LogOutputResult> logOutputResults = getTestData();
    // Split data into sub‑lists of 100 records each
    List<List<LogOutputResult>> lists = ConvertHandler.splitList(logOutputResults, 100);
    CountDownLatch countDownLatch = new CountDownLatch(lists.size());
    for (List<LogOutputResult> listSub : lists) {
        asyncService.executeAsync(listSub, logOutputResultMapper, countDownLatch);
    }
    try {
        countDownLatch.await(); // Wait for all threads to finish
    } catch (Exception e) {
        log.error("Blocking exception:" + e.getMessage());
    }
    return logOutputResults.size();
}

The test inserted 2,000,003 records using 30 concurrent threads, completing in 1.67 minutes, whereas a single‑threaded run took 5.75 minutes. Additional experiments with varying thread counts showed that more threads do not always mean better performance; a practical rule of thumb is CPU cores × 2 + 2 threads.

Data integrity checks confirmed no duplicate IDs and full record insertion after the multithreaded run.

Conclusion: Leveraging a properly sized ThreadPoolTaskExecutor in a Spring Boot application can dramatically reduce bulk insertion time for massive datasets, but the optimal thread count depends on the hardware and should be tuned empirically.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance testing spring-boot multithreading PostgreSQL Batch Insert ThreadPoolTaskExecutor

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.