Backend Development 12 min read

How to Speed Up Large Excel Imports in Java: From POI to EasyExcel and Parallel Insertion

This article walks through a step‑by‑step optimization of a Java‑based Excel import pipeline, covering the transition from raw POI to EasyPOI and EasyExcel, caching database lookups, batch inserts, optimal batch sizes, parallel streams, and logging reductions to achieve sub‑minute import times for hundreds of thousands of rows.

Programmer DD

Mar 3, 2021

How to Speed Up Large Excel Imports in Java: From POI to EasyExcel and Parallel Insertion

Requirement Overview

The system must import payment records from an Excel template filled by users, validate the data, transform it, and store overdue records, invoices, and invoice details into a MySQL database.

Initial Implementation (Version 1)

Used raw Apache POI to read rows into a List, performed per‑row database validation (e.g., duplicate invoice numbers), and inserted each row individually. This approach caused massive network I/O and slow performance, suitable only for tens of rows.

Problems Identified

Each row triggered a separate DB query for validation, multiplying I/O latency.

Rows were inserted one by one, repeating the same overhead.

POI code was verbose and hard to maintain.

Version 2 – EasyPOI + Cached Validation + Batch Insert

Introduced three improvements:

Cache Validation Data

All house information for a community (≤5000 entries) is loaded once into a HashMap<String, Long> where the key is a concatenated address and the value is the house ID, eliminating per‑row DB lookups.

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    @SuppressWarnings("unchecked")
    public Map<String, Long> getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        return handler.getMappedResults();
    }
}

public class MapResultHandler implements ResultHandler {
    private final Map mappedResults = new HashMap();
    @Override
    public void handleResult(ResultContext context) {
        Map map = (Map)context.getResultObject();
        mappedResults.put(map.get("key"), map.get("value"));
    }
    public Map getMappedResults() { return mappedResults; }
}

Batch Insert with MyBatis

Used MySQL INSERT ... VALUES (...),(...) syntax combined with MyBatis <foreach> to insert many rows in a single statement.

<insert id="insertList">
    insert into table(col1, col2)
    values
    <foreach collection="list" item="item" separator=",">
        (#{item.col1}, #{item.col2})
    </foreach>
</insert>

Version 3 – EasyExcel + Cached Validation + Batch Insert

Switched to Alibaba's EasyExcel for reading large files. EasyExcel reads 410,000 rows (25 columns, 45.5 MB) in about 50 seconds, far faster than EasyPOI, which ran out of memory on the same dataset.

Version 4 – Optimizing Insert Speed

Experimented with different batch sizes for the VALUES statement. A batch of 1,000 rows yielded the best performance, likely because larger batches exceeded the InnoDB buffer pool and caused disk swapping.

Implemented parallel insertion using Java 8 parallel streams to keep the CPU busy while waiting for network I/O.

/**
 * Utility for fast parallel insertion using streams.
 */
public class InsertConsumer {
    private static final int SIZE = 1000;
    static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
    public static <T> void insertData(List<T> list, Consumer<List<T>> consumer) {
        if (list == null || list.isEmpty()) return;
        List<List<T>> partitions = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int end = Math.min(i + SIZE, list.size());
            partitions.add(list.subList(i, end));
        }
        partitions.parallelStream().forEach(consumer);
    }
}

Usage example:

InsertConsumer.insertData(feeList, arrearageMapper::insertList);

Other Performance Tips

Logging

Excessive info logging inside loops dramatically slows processing; reducing logs cut execution time to roughly one‑tenth of the original.

Summary of Improvements

Adopt a faster Excel parser (recommend EasyExcel).

Cache data needed for validation to avoid per‑row DB queries.

Use bulk INSERT ... VALUES statements.

Insert in batches of ~1,000 rows and leverage parallel streams for concurrency.

Suppress unnecessary logging inside processing loops.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java mysql EasyExcel Excel parallelStream

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.