Backend Development 11 min read

Optimizing Large-Scale Excel Import Performance in Java Backend Applications

This article details a step‑by‑step optimization of a Java backend Excel import pipeline, covering data validation caching, batch insertion with MySQL VALUES, switching to EasyExcel, parallel stream insertion, and logging reduction to achieve sub‑minute import times for hundreds of thousands of rows.

Architecture Digest

Jan 6, 2021

Optimizing Large-Scale Excel Import Performance in Java Backend Applications

Demand Description

The project requires importing payment records from an Excel template into the system, validating and converting the data, then storing arrears, invoices, and invoice details in a MySQL database. The original implementation handled only tens of rows, but version 4.0 expects over 100,000 rows and more than 300,000 database rows, demanding significant performance improvements.

Key Details

Data import uses an .xlsx template supporting >65,535 rows.

Two types of validation: lightweight field checks (regex, length) and expensive duplicate checks that query the database.

Database: MySQL 5.7 with Druid connection pool, no sharding.

Iteration Records

First Version: POI + Row‑by‑Row Query + Row‑by‑Row Insert

Read Excel rows into a List, then for each row perform field validation, database lookups, and immediate insertion. This approach caused massive network I/O and was only suitable for single‑digit data volumes.

Second Version: EasyPOI + Cached Database Queries + Batch Insert

Improvements:

Cache frequently queried data (e.g., house information) in a HashMap to eliminate per‑row database lookups.

Custom SessionMapper using MapResultHandler to load query results directly into a map.

Batch insertion using MySQL VALUES syntax via MyBatis foreach.

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    @SuppressWarnings("unchecked")
    public Map<String, Long> getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        return handler.getMappedResults();
    }
}

Third Version: EasyExcel + Cached Queries + Batch Insert

Switched from EasyPOI to Alibaba's EasyExcel for large files (e.g., 410,000 rows, 25 columns, 45.5 MB) achieving ~50 s read time. The API is similar, making migration straightforward.

Fourth Version: Optimized Batch Size and Parallel Insertion

Experimented with batch sizes; 1,000 rows per INSERT yielded the best performance due to InnoDB buffer limits. Added a parallel‑stream based insertion utility to utilize CPU cores and hide network I/O latency.

public class InsertConsumer {
    private static final int SIZE = 1000;
    static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
    public static <T> void insertData(List<T> list, Consumer<List<T>> consumer) {
        if (list == null || list.isEmpty()) return;
        List<List<T>> batches = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int end = Math.min(i + SIZE, list.size());
            batches.add(list.subList(i, end));
        }
        batches.parallelStream().forEach(consumer);
    }
}

Usage example:

InsertConsumer.insertData(feeList, arrearageMapper::insertList);

Other Performance Factors

Logging

Excessive info logging inside loops increased execution time tenfold. Disabling these logs dramatically reduced import duration.

Summary

Adopt a fast Excel reading library (preferably EasyExcel).

Cache data needed for validation to replace per‑row database queries.

Use multi‑row INSERT statements with an optimal batch size (≈1,000 rows).

Leverage multithreading (e.g., parallel streams) to overlap I/O and CPU work.

Avoid unnecessary logging inside tight loops.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java MySQL MyBatis EasyExcel Excel parallelStream

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.