Optimizing Large-Scale Excel Import Performance in Java Backend Applications
This article details how to dramatically speed up the import of massive Excel fee‑record files in a Java backend by switching to EasyExcel, caching database lookups, using batch INSERT statements, and applying parallel streams while eliminating excessive logging.
The project requires importing fee‑record data from an Excel template into the system, converting each row into arrears, invoices, and invoice details, and persisting them to a MySQL database; with the upcoming 4.0 version the Excel file may exceed 100,000 rows, generating over 300,000 database rows, making performance optimization essential.
Two kinds of validation are performed: simple field‑length/regex checks that run entirely in memory, and duplicate‑record checks (e.g., invoice number existence) that query the database and heavily impact speed. The database runs MySQL 5.7 with Druid connection pooling.
Version 1 – POI + row‑by‑row validation + row‑by‑row insert : The original implementation read the Excel file with raw POI, mapped each row to an ArrayList , then for each row performed field checks, database lookups, and an individual INSERT. This approach caused a linear increase in network I/O and was only suitable for tens of rows.
Version 2 – EasyPOI + cache database queries + batch insert :
To eliminate per‑row database queries, all reference data (e.g., house information) is loaded once into a HashMap keyed by a composite of area/building/unit/house number, allowing O(1) lookups. A custom SessionMapper uses MapResultHandler to fill the map:
@Repository
public class SessionMapper extends SqlSessionDaoSupport {
@Resource
public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
super.setSqlSessionFactory(sqlSessionFactory);
}
@SuppressWarnings("unchecked")
public Map
getHouseMapByAreaId(Long areaId) {
MapResultHandler handler = new MapResultHandler();
this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
return handler.getMappedResults();
}
}The result‑handler implementation:
public class MapResultHandler implements ResultHandler {
private final Map mappedResults = new HashMap();
@Override
public void handleResult(ResultContext context) {
@SuppressWarnings("rawtypes")
Map map = (Map)context.getResultObject();
mappedResults.put(map.get("key"), map.get("value"));
}
public Map getMappedResults() { return mappedResults; }
}Mapper interface and XML mapping:
@Mapper
@Repository
public interface BaseUnitMapper {
Map
getHouseMapByAreaId(@Param("areaId") Long areaId);
} <select id="getHouseMapByAreaId" resultMap="mapResultLong">
SELECT CONCAT(h.bulid_area_name, h.build_name, h.unit_name, h.house_num) k,
h.house_id v
FROM base_house h
WHERE h.area_id = #{areaId}
GROUP BY h.house_id
</select>
<resultMap id="mapResultLong" type="java.util.HashMap">
<result property="key" column="k" javaType="string" jdbcType="VARCHAR"/>
<result property="value" column="v" javaType="long" jdbcType="INTEGER"/>
</resultMap>Batch insertion is performed with MySQL VALUES syntax combined with MyBatis foreach :
<insert id="insertList">
insert into table(col1, col2)
values
<foreach collection="list" item="item" separator=",">
(#{item.col1}, #{item.col2})
</foreach>
</insert>Version 3 – EasyExcel + cache + batch insert : Switching to Alibaba's EasyExcel reduced memory consumption and read speed dramatically; 410,000 rows (45.5 MB) are read in about 50 seconds, far faster than EasyPOI which ran out of memory.
Version 4 – Optimizing insert speed : Experiments showed that inserting 1,000 rows per batch yielded the best throughput, likely because larger batches exceeded the InnoDB buffer pool and caused disk swapping. Parallel streams are used to insert batches concurrently, with the parallelism level set via a system property.
public class InsertConsumer {
private static final int SIZE = 1000;
static { System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4"); }
public static
void insertData(List
list, Consumer
> consumer) {
if (list == null || list.isEmpty()) return;
List
> streamList = new ArrayList<>();
for (int i = 0; i < list.size(); i += SIZE) {
int j = Math.min(i + SIZE, list.size());
streamList.add(list.subList(i, j));
}
streamList.parallelStream().forEach(consumer);
}
}Usage example:
InsertConsumer.insertData(feeList, arrearageMapper::insertList);Additional performance tips include disabling excessive info logging inside loops, which was found to increase total processing time by tenfold.
Summary : To accelerate Excel imports, use a fast reader like EasyExcel, cache reference data to avoid per‑row DB queries, employ bulk INSERT ... VALUES statements, batch inserts in sizes around 1,000 rows, leverage parallel streams for concurrency, and suppress unnecessary logging.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.