Backend Development 11 min read

Optimizing Large-Scale Excel Import Performance in Java Backend Applications

This article details a step‑by‑step optimization of a Java backend Excel import pipeline, covering data validation caching, batch insertion with MySQL VALUES, switching to EasyExcel, parallel stream insertion, and logging reduction to achieve sub‑minute import times for hundreds of thousands of rows.

Architecture Digest
Architecture Digest
Architecture Digest
Optimizing Large-Scale Excel Import Performance in Java Backend Applications

Demand Description

The project requires importing payment records from an Excel template into the system, validating and converting the data, then storing arrears, invoices, and invoice details in a MySQL database. The original implementation handled only tens of rows, but version 4.0 expects over 100,000 rows and more than 300,000 database rows, demanding significant performance improvements.

Key Details

Data import uses an .xlsx template supporting >65,535 rows.

Two types of validation: lightweight field checks (regex, length) and expensive duplicate checks that query the database.

Database: MySQL 5.7 with Druid connection pool, no sharding.

Iteration Records

First Version: POI + Row‑by‑Row Query + Row‑by‑Row Insert

Read Excel rows into a List , then for each row perform field validation, database lookups, and immediate insertion. This approach caused massive network I/O and was only suitable for single‑digit data volumes.

Second Version: EasyPOI + Cached Database Queries + Batch Insert

Improvements:

Cache frequently queried data (e.g., house information) in a HashMap to eliminate per‑row database lookups.

Custom SessionMapper using MapResultHandler to load query results directly into a map.

Batch insertion using MySQL VALUES syntax via MyBatis foreach .

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    @SuppressWarnings("unchecked")
    public Map
getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        return handler.getMappedResults();
    }
}

Third Version: EasyExcel + Cached Queries + Batch Insert

Switched from EasyPOI to Alibaba's EasyExcel for large files (e.g., 410,000 rows, 25 columns, 45.5 MB) achieving ~50 s read time. The API is similar, making migration straightforward.

Fourth Version: Optimized Batch Size and Parallel Insertion

Experimented with batch sizes; 1,000 rows per INSERT yielded the best performance due to InnoDB buffer limits. Added a parallel‑stream based insertion utility to utilize CPU cores and hide network I/O latency.

public class InsertConsumer {
    private static final int SIZE = 1000;
    static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
    public static
void insertData(List
list, Consumer
> consumer) {
        if (list == null || list.isEmpty()) return;
        List
> batches = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int end = Math.min(i + SIZE, list.size());
            batches.add(list.subList(i, end));
        }
        batches.parallelStream().forEach(consumer);
    }
}

Usage example:

InsertConsumer.insertData(feeList, arrearageMapper::insertList);

Other Performance Factors

Logging

Excessive info logging inside loops increased execution time tenfold. Disabling these logs dramatically reduced import duration.

Summary

Adopt a fast Excel reading library (preferably EasyExcel).

Cache data needed for validation to replace per‑row database queries.

Use multi‑row INSERT statements with an optimal batch size (≈1,000 rows).

Leverage multithreading (e.g., parallel streams) to overlap I/O and CPU work.

Avoid unnecessary logging inside tight loops.

JavaperformanceMySQLMyBatiseasyexcelexcelParallelStream
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.