How to Efficiently Import and Export Millions of Records with POI and EasyExcel
This article explains how to handle massive Excel import/export tasks in Java by comparing POI workbook implementations, selecting the appropriate class based on data size, and using EasyExcel with batch processing, JDBC transactions, and pagination to achieve fast, low‑memory operations for hundreds of millions of rows.
Background
Large‑scale import/export between Excel files and relational databases often exceeds the limits of Apache POI, leading to OutOfMemory errors.
POI Workbook Implementations
HSSFWorkbook – works with .xls files, maximum 65,535 rows, low memory usage.
XSSFWorkbook – works with .xlsx files, up to 1,048,576 rows, keeps all data in memory.
SXSSFWorkbook – streaming write introduced in POI 3.8, writes to temporary files to keep memory low, but does not support sheet cloning, formula evaluation, or header modification after writing.
Selection Guidelines
When the row count is ≤70,000, use HSSFWorkbook or XSSFWorkbook.
When the row count exceeds 70,000 and no complex styling or formulas are required, use SXSSFWorkbook.
When the row count exceeds 70,000 and styling, formulas, or header manipulation are needed, use XSSFWorkbook in batches.
EasyExcel Overview
EasyExcel is an Alibaba‑maintained wrapper around POI that reduces memory consumption and simplifies API usage. Repository: https://github.com/alibaba/easyexcel
Exporting Millions of Rows
Strategy:
Query the database in fixed‑size pages (e.g., 200 k rows per query).
Write each page to the current sheet using EasyExcel.
When a sheet reaches 1 000 000 rows, start a new sheet.
Core export code (simplified):
public void export(HttpServletResponse response) throws IOException {
long start = System.currentTimeMillis();
OutputStream out = response.getOutputStream();
ExcelWriter writer = new ExcelWriter(out, ExcelTypeEnum.XLSX);
// configure table head, sheet handling, batch loops …
// writer.write0(dataList, sheet, table);
writer.finish();
System.out.println("Export time: " + (System.currentTimeMillis() - start) / 1000 + "s");
}Calculate the number of sheets and the number of writes per sheet based on total row count and page size.
Importing Millions of Rows
Strategy:
Read Excel sheets in chunks (e.g., 200 k rows per read) using EasyExcel’s listener.
Accumulate rows in a list until a threshold (e.g., 100 k) is reached.
Insert the batch into the database with plain JDBC prepared statements inside a manual transaction, then clear the list for GC.
Listener example:
public class GeneralDataListener extends AnalysisEventListener<Map<Integer, String>> {
private List<Map<Integer, String>> buffer = new ArrayList<>();
private final ActResultLogService service;
@Override
public void invoke(Map<Integer, String> data, AnalysisContext ctx) {
buffer.add(data);
if (buffer.size() >= ExcelConstants.GENERAL_ONCE_SAVE_TO_DB_ROWS) {
save();
buffer.clear();
}
}
private void save() { service.import2DBFromExcel10w(buffer); }
@Override
public void doAfterAllAnalysed(AnalysisContext ctx) { save(); }
}JDBC batch insert example:
String sql = "INSERT INTO ACT_RESULT_LOG (onlineseqid,businessid,becifno,ivisresult,createdby,createddate,updateby,updateddate,risklevel) VALUES (?,?,?,?,?,?,?,?,?)";
PreparedStatement ps = conn.prepareStatement(sql);
for (Map<Integer, String> row : dataList) {
ps.setString(1, row.get(0));
// set other parameters …
ps.addBatch();
}
ps.executeBatch();
conn.commit();Test Environment and Results
Database: Oracle 19c (MySQL gives similar performance for < 100 M rows). Simulated query: SELECT * FROM ACT_RESULT_LOG WHERE rownum < 3000001.
Export of 3 M rows (no styling, simple header) took about 2 minutes 15 seconds. Import of the same dataset using the listener‑JDBC pipeline took roughly 8.2 seconds (≈0.82 s per 100 k batch).
Performance Summary
Export: 3 M rows → 2 min 15 s.
Import: 3 M rows → 8.209 s total.
Key Takeaways
EasyExcel combined with batch JDBC and manual transaction control provides a fast, low‑memory solution for both export and import of massive datasets. The approach scales by paging database queries, streaming writes, and chunked reads, making it suitable for production scenarios that require moving millions of rows between Excel and a relational database.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
