Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel, Caching, Batch Inserts, and Parallel Streams
This article details a step‑by‑step optimization of a Java backend that imports massive Excel files into MySQL, covering library switches, data caching, batch INSERT statements, and parallel stream insertion to cut processing time from minutes to under two minutes.
The article describes a real‑world requirement to import massive Excel files (over 100,000 rows) containing payment records into a MySQL database, and the need to reduce import time from minutes to seconds.
First version used Apache POI with row‑by‑row validation and insertion, which caused N‑times database round‑trips and severe slowdown.
Second version introduced EasyPOI, cached lookup data in a HashMap , and performed batch inserts using MySQL INSERT ... VALUES (...),(...) syntax. A custom SessionMapper and MapResultHandler were added to load house data into the cache.
@Repository
public class SessionMapper extends SqlSessionDaoSupport {
@Resource
public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
super.setSqlSessionFactory(sqlSessionFactory);
}
@SuppressWarnings("unchecked")
public Map
getHouseMapByAreaId(Long areaId) {
MapResultHandler handler = new MapResultHandler();
this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
return handler.getMappedResults();
}
} public class MapResultHandler implements ResultHandler {
private final Map mappedResults = new HashMap();
@Override
public void handleResult(ResultContext context) {
Map map = (Map)context.getResultObject();
mappedResults.put(map.get("key"), map.get("value"));
}
public Map getMappedResults() { return mappedResults; }
}Third version switched to Alibaba EasyExcel for faster reading of large XLSX files, reducing read time to about 50 seconds for 410 k rows.
Fourth version tuned batch size (optimal 1 000 rows per statement) and added a parallel‑stream based utility InsertConsumer to split the list and insert concurrently, achieving ~95 seconds total import time.
public class InsertConsumer {
private static final int SIZE = 1000;
static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
public static
void insertData(List
list, Consumer
> consumer) {
if (list == null || list.isEmpty()) return;
List
> streamList = new ArrayList<>();
for (int i = 0; i < list.size(); i += SIZE) {
int j = Math.min(i + SIZE, list.size());
streamList.add(list.subList(i, j));
}
streamList.parallelStream().forEach(consumer);
}
}Usage example:
InsertConsumer.insertData(feeList, arrearageMapper::insertList);Additional tips include disabling excessive INFO logging inside loops and using multithreading to hide I/O latency.
Conclusion: Use a high‑performance Excel library, cache reference data, batch insert with an appropriate size, and parallelize the write phase to dramatically speed up large Excel imports.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.