How to Efficiently Import and Export Millions of Records with EasyExcel and POI
This article explains the challenges of large‑scale Excel import/export in Java, compares POI workbook implementations, introduces EasyExcel for high‑performance handling of hundreds of thousands to millions of rows, and provides detailed code examples for batch querying, sheet management, and JDBC batch insertion with transaction control.
Background
In many projects data import from Excel to a database and export from a database to Excel are required. The author encountered massive data import/export tasks at work and decided to solve the problem rather than avoid it.
1. Comparison of Traditional POI Versions
POI provides three main workbook implementations:
HSSFWorkbook – works with Excel 2003 and earlier (.xls). It can export up to 65,535 rows, but does not cause memory overflow for small datasets.
XSSFWorkbook – works with Excel 2007+ (.xlsx). It supports up to 1,048,576 rows and 16,384 columns, but keeps all data in memory, leading to possible OutOfMemory errors.
SXSSFWorkbook – introduced in POI 3.8, based on XSSF with low memory usage. It writes data to disk when memory reaches a threshold, allowing creation of very large Excel files, but has limitations such as no sheet cloning, no formula evaluation, and limited access to rows already persisted to disk.
Choosing the Appropriate Workbook
Guidelines:
If the data volume is under 70,000 rows, HSSFWorkbook or XSSFWorkbook can be used.
If the volume exceeds 70,000 rows and no styling, formulas, or complex formatting is needed, SXSSFWorkbook is recommended.
If the volume exceeds 70,000 rows and styling or formulas are required, use XSSFWorkbook together with batch queries and batch writes.
2. Import and Export of Millions of Records (Main Course)
Problem Analysis
The author faced the following issues:
Traditional POI caused memory overflow and low efficiency for huge datasets.
Fetching all rows with a single SELECT was too slow.
Writing all rows into a single sheet would produce a massive file that is slow to open.
Row‑by‑row I/O operations are inefficient.
Inserting rows one by one into the database is unacceptable.
MyBatis batch insertion is essentially a loop and also slow.
Solution Overview
Use Alibaba's EasyExcel library to simplify POI usage, perform batch queries, write data to multiple sheets, and insert data into the database using JDBC batch operations with manual transaction control.
EasyExcel Introduction
GitHub: https://github.com/alibaba/easyexcel
300 W Data Export
Export logic:
Query the database in batches (e.g., 200 k rows per batch).
Write each batch to Excel using EasyExcel.
When a sheet reaches 1 M rows, start a new sheet.
Repeat until all data are exported.
public void dataExport300w(HttpServletResponse response) {
OutputStream outputStream = null;
try {
long startTime = System.currentTimeMillis();
System.out.println("Export start time:" + startTime);
outputStream = response.getOutputStream();
ExcelWriter writer = new ExcelWriter(outputStream, ExcelTypeEnum.XLSX);
String fileName = new String(("excel100w").getBytes(), "UTF-8");
Table table = new Table(1);
List<List<String>> titles = new ArrayList<>();
titles.add(Arrays.asList("onlineseqid"));
// ... other column titles ...
table.setHead(titles);
int count = 3000001;
Integer totalCount = actResultLogMapper.findActResultLogByCondations(count);
Integer sheetDataRows = ExcelConstants.PER_SHEET_ROW_COUNT;
Integer writeDataRows = ExcelConstants.PER_WRITE_ROW_COUNT;
Integer sheetNum = totalCount % sheetDataRows == 0 ? (totalCount / sheetDataRows) : (totalCount / sheetDataRows + 1);
Integer oneSheetWriteCount = sheetDataRows / writeDataRows;
Integer lastSheetWriteCount = totalCount % sheetDataRows == 0 ? oneSheetWriteCount : (totalCount % sheetDataRows % writeDataRows == 0 ? (totalCount / sheetDataRows / writeDataRows) : (totalCount / sheetDataRows / writeDataRows + 1));
List<List<String>> dataList = new ArrayList<>();
for (int i = 0; i < sheetNum; i++) {
Sheet sheet = new Sheet(i, 0);
sheet.setSheetName("TestSheet" + i);
for (int j = 0; j < (i != sheetNum - 1 ? oneSheetWriteCount : lastSheetWriteCount); j++) {
dataList.clear();
PageHelper.startPage(j + 1 + oneSheetWriteCount * i, writeDataRows);
List<ActResultLog> resultList = actResultLogMapper.findByPage100w();
if (!CollectionUtils.isEmpty(resultList)) {
resultList.forEach(item -> dataList.add(Arrays.asList(item.getOnlineseqid(), item.getBusinessid(), item.getBecifno(), item.getIvisresult(), item.getCreatedby(), Calendar.getInstance().getTime().toString(), item.getUpdateby(), Calendar.getInstance().getTime().toString(), item.getRisklevel())));
}
writer.write0(dataList, sheet, table);
}
}
response.setHeader("Content-Disposition", "attachment;filename=" + new String((fileName).getBytes("gb2312"), "ISO-8859-1") + ".xlsx");
response.setContentType("multipart/form-data");
response.setCharacterEncoding("utf-8");
writer.finish();
outputStream.flush();
long endTime = System.currentTimeMillis();
System.out.println("Export end time:" + endTime + "ms");
System.out.println("Export duration:" + (endTime - startTime) / 1000 + " seconds");
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (outputStream != null) {
try { outputStream.close(); } catch (Exception e) { e.printStackTrace(); }
}
}
}Test environment: a machine with typical specifications (screenshots omitted).
Database Version
The tests used Oracle 19c, but MySQL would give similar performance for data under 100 M rows.
300 W Data Import
Import strategy:
Read Excel in batches (e.g., 200 k rows per batch) using EasyExcel.
Accumulate rows in a collection; when the collection reaches a threshold, perform a JDBC batch insert within a transaction.
// EasyExcel read example
@Test
public void import2DBFromExcel10wTest() {
String fileName = "D:\\StudyWorkspace\\...\\excel300w.xlsx";
long startReadTime = System.currentTimeMillis();
System.out.println("Start reading time:" + startReadTime + "ms");
EasyExcel.read(fileName, new EasyExceGeneralDatalListener(actResultLogService2)).doReadAll();
long endReadTime = System.currentTimeMillis();
System.out.println("End reading time:" + endReadTime + "ms");
}
public class EasyExceGeneralDatalListener extends AnalysisEventListener<Map<Integer, String>> {
private ActResultLogService2 actResultLogService2;
private List<Map<Integer, String>> dataList = new ArrayList<>();
public EasyExceGeneralDatalListener() {}
public EasyExceGeneralDatalListener(ActResultLogService2 service) { this.actResultLogService2 = service; }
@Override
public void invoke(Map<Integer, String> data, AnalysisContext context) {
dataList.add(data);
if (dataList.size() >= ExcelConstants.GENERAL_ONCE_SAVE_TO_DB_ROWS) {
saveData();
dataList.clear();
}
}
private void saveData() { actResultLogService2.import2DBFromExcel10w(dataList); }
@Override
public void doAfterAllAnalysed(AnalysisContext context) { saveData(); dataList.clear(); }
}
public Map<String, Object> import2DBFromExcel10w(List<Map<Integer, String>> dataList) {
Map<String, Object> result = new HashMap<>();
if (dataList.isEmpty()) { result.put("empty", "0000"); return result; }
Connection conn = null;
PreparedStatement ps = null;
try {
long startTime = System.currentTimeMillis();
conn = JDBCDruidUtils.getConnection();
conn.setAutoCommit(false);
String sql = "insert into ACT_RESULT_LOG (onlineseqid,businessid,becifno,ivisresult,createdby,createddate,updateby,updateddate,risklevel) values (?,?,?,?,?,?,?,?,?)";
ps = conn.prepareStatement(sql);
for (Map<Integer, String> item : dataList) {
ps.setString(1, item.get(0));
ps.setString(2, item.get(1));
ps.setString(3, item.get(2));
ps.setString(4, item.get(3));
ps.setString(5, item.get(4));
ps.setTimestamp(6, new Timestamp(System.currentTimeMillis()));
ps.setString(7, item.get(6));
ps.setTimestamp(8, new Timestamp(System.currentTimeMillis()));
ps.setString(9, item.get(8));
ps.addBatch();
}
ps.executeBatch();
conn.commit();
long endTime = System.currentTimeMillis();
System.out.println("Inserted " + dataList.size() + " rows in " + (endTime - startTime) + " ms");
result.put("success", "1111");
} catch (Exception e) {
result.put("exception", "0000");
e.printStackTrace();
} finally {
JDBCDruidUtils.close(conn, ps);
}
return result;
}Test results showed that reading 300 W rows with EasyExcel took about 83 seconds, while batch inserting them with JDBC took only about 8 seconds.
Conclusion
The author successfully handled million‑level Excel import/export by combining EasyExcel for low‑memory Excel processing, batch database queries, and JDBC batch inserts with transaction control. The approach dramatically reduces memory consumption and execution time, making it suitable for production environments dealing with massive data volumes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
