Backend Development 11 min read

Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel, Caching, Batch Inserts, and Parallel Streams

This article details a step-by-step optimization of a Java-based Excel import process, covering migration from POI to EasyExcel, caching database lookups, using batch inserts with VALUES, and leveraging parallel streams to dramatically reduce import time for massive datasets.

Java Captain
Java Captain
Java Captain
Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel, Caching, Batch Inserts, and Parallel Streams

The project requires importing payment‑record Excel files, validating data, converting it into arrears, bills, and details, and persisting everything into a MySQL 5.7 database; the original POI implementation processed each row sequentially, causing severe performance bottlenecks when handling 100k+ rows.

First version used raw POI to read rows into List<ArrayList> , performed per‑row field validation, database lookups (e.g., house existence), and inserted each row individually. This resulted in a linear increase of network I/O and CPU usage, making the solution unsuitable beyond a few dozen rows.

Second version switched to EasyPOI and introduced caching of database queries. All house information for a community was loaded once into a HashMap<String, Long> via a custom SessionMapper and MapResultHandler , eliminating repeated lookups. Batch insertion was performed with MySQL VALUES syntax using MyBatis foreach to insert multiple rows per statement.

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    @SuppressWarnings("unchecked")
    public Map
getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        return handler.getMappedResults();
    }
}

public class MapResultHandler implements ResultHandler {
    private final Map mappedResults = new HashMap();
    @Override
    public void handleResult(ResultContext context) {
        Map map = (Map)context.getResultObject();
        mappedResults.put(map.get("key"), map.get("value"));
    }
    public Map getMappedResults() { return mappedResults; }
}

Third version adopted Alibaba's EasyExcel, which reads large Excel files (e.g., 410 000 rows, 25 columns, 45.5 MB) in about 50 seconds, far faster than EasyPOI and without OOM errors.

Fourth version focused on insert performance: the optimal batch size was found to be 1 000 rows per INSERT . A functional utility class InsertConsumer splits the data list into sub‑lists of this size and inserts them in parallel using Java 8 parallel streams, achieving 72 seconds for 100 k rows and a total import time of 95 seconds.

public class InsertConsumer {
    private static final int SIZE = 1000;
    static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
    public static
void insertData(List
list, Consumer
> consumer) {
        if (list == null || list.isEmpty()) return;
        List
> streamList = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int j = Math.min(i + SIZE, list.size());
            streamList.add(list.subList(i, j));
        }
        streamList.parallelStream().forEach(consumer);
    }
}

InsertConsumer.insertData(feeList, arrearageMapper::insertList);

Additional performance tips include disabling excessive info logging inside loops, as logging can consume up to ten times more time than the same logic without logging.

Summary of recommendations :

Use a fast Excel library such as EasyExcel for large files.

Cache data needed for validation to replace per‑row database queries.

Batch insert rows with the VALUES syntax.

Leverage multithreading (e.g., parallel streams) to hide I/O latency.

Avoid unnecessary logging inside tight loops.

JavaperformanceMySQLbatch insertexcelParallel Stream
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.