How to Speed Up Massive Excel Imports: From POI to EasyExcel and Parallel Inserts
This article details a step‑by‑step optimization of a large‑scale Excel import workflow, covering the transition from raw POI to EasyExcel, caching database lookups, batch INSERT statements, and parallel stream processing to reduce a 100k‑row import from minutes to under two minutes.
Requirement Description
Project needs to import payment records from an Excel template. Users fill data from other systems into the template, the application reads, validates, transforms, and stores overdue data, invoices, and details into the database.
In version 4.0 the Excel file may contain over 100,000 rows, generating more than 300,000 database rows, so the original import code must be optimized. The following steps describe the performance problems and solutions.
Details
Data import uses an xlsx template supporting >65,535 rows.
Data validation includes field length/regex checks (in‑memory) and duplicate checks that query the database, which heavily impacts performance.
Data insertion uses MySQL 5.7 with Druid connection pool, no sharding.
Iteration Records
1. First version: POI + row‑by‑row validation + row‑by‑row insert
Reads Excel with POI, maps rows to an ArrayList, then for each row performs field checks, database lookups, and inserts the row. Problems:
Each row triggers a separate database query, multiplying network I/O.
Rows are inserted one by one, causing the same overhead.
Raw POI code is verbose and hard to maintain.
2. Second version: EasyPOI + cache database queries + batch insert
Cache data to trade space for time
All data needed for validation is loaded into a HashMap, eliminating per‑row database queries. Example: house information is cached with a composite key (area/building/unit/room) mapping to the house ID.
Custom SessionMapper
MyBatis does not directly map query results into a HashMap, so a custom SessionMapper using a MapResultHandler is created.
@Repository
public class SessionMapper extends SqlSessionDaoSupport {
@Resource
public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
super.setSqlSessionFactory(sqlSessionFactory);
}
// area+building+unit+room -> houseId
@SuppressWarnings("unchecked")
public Map<String, Long> getHouseMapByAreaId(Long areaId) {
MapResultHandler handler = new MapResultHandler();
this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
return handler.getMappedResults();
}
}MapResultHandler stores the result set into a HashMap.
public class MapResultHandler implements ResultHandler {
private final Map mappedResults = new HashMap();
@Override
public void handleResult(ResultContext context) {
Map map = (Map) context.getResultObject();
mappedResults.put(map.get("key"), map.get("value"));
}
public Map getMappedResults() {
return mappedResults;
}
}Mapper interface and XML define the query returning a map of composite key to house ID.
@Mapper
@Repository
public interface BaseUnitMapper {
// area+building+unit+room -> houseId
Map<String, Long> getHouseMapByAreaId(@Param("areaId") Long areaId);
} <select id="getHouseMapByAreaId" resultMap="mapResultLong">
SELECT CONCAT(h.build_area_name, h.build_name, h.unit_name, h.house_num) k,
h.house_id v
FROM base_house h
WHERE h.area_id = #{areaId}
GROUP BY h.house_id
</select>Batch insert with VALUES
MySQL supports inserting multiple rows with a single INSERT statement. MyBatis foreach can generate the VALUES list.
<insert id="insertList">
insert into table(col1, col2)
values
<foreach collection="list" item="item" separator=",">
(#{item.col1}, #{item.col2})
</foreach>
</insert>Using EasyExcel for large files
EasyExcel reads Excel with annotation‑based mapping and handles 410,000 rows (45 MB) in about 50 seconds, far faster than EasyPOI.
3. Third version: EasyExcel + cache + batch insert
Switching from EasyPOI to EasyExcel reduced memory consumption and improved speed for the 410k‑row file.
4. Fourth version: Optimizing insert speed
Batch size was tuned from 30,000 down to 1,000 rows per INSERT; 1,000 gave the best performance, likely because larger statements exceeded the InnoDB buffer pool and caused disk swapping.
Parallel insertion using Java parallel streams further hides network I/O latency. Example results: 100,000 rows, 420,000 detail records, 16 threads, 1,000 rows per batch → 72 s insert time, total import 95 s.
Parallel Insert Utility
A functional‑style utility class wraps parallel stream insertion.
/**
* Fast parallel insertion using streams
*/
public class InsertConsumer {
private static final int SIZE = 1000;
static {
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4");
}
public static <T> void insertData(List<T> list, Consumer<List<T>> consumer) {
if (list == null || list.isEmpty()) return;
List<List<T>> batches = new ArrayList<>();
for (int i = 0; i < list.size(); i += SIZE) {
int end = Math.min(i + SIZE, list.size());
batches.add(list.subList(i, end));
}
batches.parallelStream().forEach(consumer);
}
}Usage example:
InsertConsumer.insertData(feeList, arrearageMapper::insertList);Other performance tips
Logging
Printing info logs inside loops dramatically slows processing; disabling them reduced the validation phase to one‑tenth of the original time.
Summary
Use a fast Excel library such as EasyExcel.
Cache data needed for database‑related validation.
Batch insert with VALUES(...),(...).
Leverage multithreading (parallel streams) to hide I/O latency.
Avoid excessive logging inside loops.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
