Backend Development 5 min read

Improving Million‑Scale Data Insert Efficiency with Spring Boot ThreadPoolTaskExecutor

This article demonstrates how to boost the insertion speed of over two million records by configuring a Spring Boot ThreadPoolTaskExecutor, integrating it with MyBatis‑Plus and PostgreSQL, and measuring multi‑threaded versus single‑threaded performance to determine the optimal thread count.

Architecture Digest
Architecture Digest
Architecture Digest
Improving Million‑Scale Data Insert Efficiency with Spring Boot ThreadPoolTaskExecutor

Purpose : Increase the efficiency of inserting data at a million‑scale level.

Solution : Use ThreadPoolTaskExecutor to perform multi‑threaded batch inserts.

Technology Stack : Spring Boot 2.1.1, MyBatis‑Plus 3.0.6, Swagger 2.5.0, Lombok 1.18.4, PostgreSQL, and ThreadPoolTaskExecutor .

Thread Pool Configuration (application‑dev.properties) :

# 异步线程配置
# 配置核心线程数
async.executor.thread.core_pool_size = 30
# 配置最大线程数
async.executor.thread.max_pool_size = 30
# 配置队列大小
async.executor.thread.queue_capacity = 99988
# 配置线程池中的线程的名称前缀
async.executor.thread.name.prefix = async-importDB-

Spring Bean Definition :

@Configuration
@EnableAsync
@Slf4j
public class ExecutorConfig {
    @Value("${async.executor.thread.core_pool_size}")
    private int corePoolSize;
    @Value("${async.executor.thread.max_pool_size}")
    private int maxPoolSize;
    @Value("${async.executor.thread.queue_capacity}")
    private int queueCapacity;
    @Value("${async.executor.thread.name.prefix}")
    private String namePrefix;

    @Bean(name = "asyncServiceExecutor")
    public Executor asyncServiceExecutor() {
        log.warn("start asyncServiceExecutor");
        ThreadPoolTaskExecutor executor = new VisiableThreadPoolTaskExecutor();
        executor.setCorePoolSize(corePoolSize);
        executor.setMaxPoolSize(maxPoolSize);
        executor.setQueueCapacity(queueCapacity);
        executor.setThreadNamePrefix(namePrefix);
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        executor.initialize();
        return executor;
    }
}

Asynchronous Service Implementation :

@Service
@Slf4j
public class AsyncServiceImpl implements AsyncService {
    @Override
    @Async("asyncServiceExecutor")
    public void executeAsync(List
logOutputResults, LogOutputResultMapper logOutputResultMapper, CountDownLatch countDownLatch) {
        try {
            log.warn("start executeAsync");
            // asynchronous work
            logOutputResultMapper.addLogOutputResultBatch(logOutputResults);
            log.warn("end executeAsync");
        } finally {
            countDownLatch.countDown(); // ensure latch release
        }
    }
}

Multi‑Threaded Batch Insert Method :

@Override
public int testMultiThread() {
    List
logOutputResults = getTestData();
    // split every 100 records into a sub‑list
    List
> lists = ConvertHandler.splitList(logOutputResults, 100);
    CountDownLatch countDownLatch = new CountDownLatch(lists.size());
    for (List
listSub : lists) {
        asyncService.executeAsync(listSub, logOutputResultMapper, countDownLatch);
    }
    try {
        countDownLatch.await(); // wait for all threads
    } catch (Exception e) {
        log.error("阻塞异常:" + e.getMessage());
    }
    return logOutputResults.size();
}

Test Results :

2000003 records inserted with 30 threads: 1.67 minutes .

Same data inserted with a single thread: 5.75 minutes .

Various thread counts were tested; performance does not improve indefinitely with more threads.

The practical rule of thumb observed is CPU cores * 2 + 2 threads for optimal throughput.

Conclusion : Multi‑threaded insertion using a properly configured ThreadPoolTaskExecutor dramatically reduces processing time for large data sets, but the optimal thread count depends on the hardware and should follow the “cores × 2 + 2” guideline.

Spring BootmultithreadingPostgreSQLMyBatis-Plusbatch insertThreadPoolTaskExecutor
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.