Databases 12 min read

Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel, Caching, Batch Inserts and Parallel Processing

This article details how to dramatically speed up Excel‑driven fee‑record imports by replacing naïve POI row‑by‑row processing with EasyExcel, caching database look‑ups, using MySQL batch INSERTs, and finally applying parallel streams for multi‑threaded insertion, while also trimming excessive logging.

Architecture Digest
Architecture Digest
Architecture Digest
Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel, Caching, Batch Inserts and Parallel Processing

The original requirement was to import fee‑payment records from an Excel template into a system, validate the data, and store overdue records, invoices and details in a MySQL 5.7 database.

Initial implementation (Version 1) used raw Apache POI to read each row into a List<ArrayList>, performed per‑row database validation (e.g., checking duplicate invoice numbers) and inserted rows one by one. This approach worked for a few dozen rows but became a bottleneck when the Excel size grew to 100 k+ rows, resulting in millions of database operations.

Version 2 – EasyPOI + Cache + Batch Insert

To address the three main problems (repeated DB queries, per‑row inserts, and verbose POI code), the author introduced:

Caching of reference data (e.g., house information) in a HashMap to replace per‑row queries.

A custom SessionMapper that uses a MapResultHandler to load the cache with a single SQL call.

MyBatis batch insertion via INSERT … VALUES with a foreach loop.

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    // 区域楼宇单元房号 - 房屋ID
    @SuppressWarnings("unchecked")
    public Map<String, Long> getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        return handler.getMappedResults();
    }
}
public class MapResultHandler implements ResultHandler {
    private final Map mappedResults = new HashMap();
    @Override
    public void handleResult(ResultContext context) {
        @SuppressWarnings("rawtypes")
        Map map = (Map)context.getResultObject();
        mappedResults.put(map.get("key"), map.get("value"));
    }
    public Map getMappedResults() { return mappedResults; }
}

Version 3 – EasyExcel + Cache + Batch Insert

When the Excel grew to 410 k rows, EasyPOI caused OOM and high CPU usage. Switching to Alibaba's EasyExcel reduced memory consumption and read 410 k rows (25 columns, ~45 MB) in about 50 seconds.

Version 4 – Optimized Batch Insertion

Batch size was tuned from 30 000 down to 1 000 rows per INSERT. Experiments showed 1 000 rows gave the best throughput, likely because larger statements exceeded the InnoDB buffer pool and triggered disk swapping.

To further hide network I/O latency, the author wrapped the batch insert in a parallel‑stream utility:

/**
 * Utility for fast parallel insertion using Java 8 streams.
 */
public class InsertConsumer {
    private static final int SIZE = 1000; // rows per batch
    static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
    public static <T> void insertData(List<T> list, Consumer<List<T>> consumer) {
        if (list == null || list.size() < 1) return;
        List<List<T>> streamList = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int j = Math.min(i + SIZE, list.size());
            streamList.add(list.subList(i, j));
        }
        streamList.parallelStream().forEach(consumer);
    }
}

Usage example:

InsertConsumer.insertData(feeList, arrearageMapper::insertList);

Other Performance Tips

Excessive logging inside loops dramatically slowed the process; removing info statements reduced total runtime to roughly one‑tenth of the original.

Conclusion

Prefer fast Excel readers like EasyExcel.

Cache reference data to avoid per‑row DB look‑ups.

Use INSERT … VALUES batch statements (≈1 000 rows per batch).

Leverage parallel streams or other multithreading to overlap I/O.

Avoid logging inside tight loops.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaMySQLMyBatisEasyExcelExcelParallelProcessing
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.