Backend Development 11 min read

Optimizing Large-Scale Excel Import Performance in Java Backend Applications

This article details how to dramatically speed up the import of massive Excel files in a Java backend by switching to EasyExcel, caching database lookups, using batch inserts, and applying parallel streams while eliminating excessive logging.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Optimizing Large-Scale Excel Import Performance in Java Backend Applications

The project requires importing fee‑payment records from Excel templates into a Java backend system, where each row may generate arrears data, bills, and details stored in a MySQL database; with the upcoming 4.0 version the Excel file can exceed 100,000 rows and generate over 300,000 database rows, making the original import code unbearably slow.

In the first implementation (POI + per‑row query + per‑row insert) every row performed multiple database lookups for validation (e.g., checking if a house exists) and inserted data one row at a time, leading to an N‑fold increase in network I/O and execution time, as well as verbose, hard‑to‑maintain code.

Version two replaces raw POI with EasyPOI, introduces a HashMap cache for all validation data, and adds a custom SessionMapper that uses a MapResultHandler to load house‑id mappings into memory, eliminating per‑row database queries. Batch insertion is achieved with MySQL VALUES syntax combined with MyBatis foreach :

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    @SuppressWarnings("unchecked")
    public Map
getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        return handler.getMappedResults();
    }
}

The accompanying MapResultHandler and MyBatis mapper XML store the query result as key (concatenated area/build/unit/house) and value (house_id), enabling fast in‑memory validation.

Version three switches to Alibaba's EasyExcel, which reads large XLSX files (e.g., 410,000 rows, 25 columns, 45.5 MB) in about 50 seconds, avoiding the OOM issues encountered with EasyPOI on the same data size.

Version four fine‑tunes the insert phase: the batch size is reduced to 1,000 rows per INSERT statement (larger batches caused disk swapping), and a parallel‑stream utility InsertConsumer distributes the batches across multiple threads (defaulting to four parallel threads). Sample utility code:

public class InsertConsumer {
    private static final int SIZE = 1000;
    static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
    public static
void insertData(List
list, Consumer
> consumer) {
        if (list == null || list.isEmpty()) return;
        List
> streamList = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int j = Math.min(i + SIZE, list.size());
            streamList.add(list.subList(i, j));
        }
        streamList.parallelStream().forEach(consumer);
    }
}

Additional performance tips include disabling excessive info logging inside loops, which was shown to reduce processing time by a factor of ten.

In summary, the article recommends using a faster Excel parser (EasyExcel), caching database‑related validation data, employing bulk VALUES inserts, leveraging multithreaded batch insertion via parallel streams, and minimizing logging to achieve sub‑two‑minute imports for massive Excel datasets.

Javaperformance optimizationMySQLbatch inserteasyexcelexcelParallel Stream
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.