How to Efficiently Import and Export Millions of Records with EasyExcel and POI
This article analyzes traditional POI implementations, compares their strengths and weaknesses, and presents a high‑performance solution using EasyExcel for exporting and importing up to 3 million rows, including batch querying, sheet management, JDBC batch inserts, and detailed performance benchmarks.
Background
Large‑scale data import/export between Excel files and relational databases often exceeds the limits of the classic Apache POI APIs (HSSFWorkbook, XSSFWorkbook, SXSSFWorkbook). Memory consumption and row‑count caps cause OutOfMemory errors when processing millions of records.
Traditional POI implementations
HSSFWorkbook : Supports the legacy .xls format (Excel 2003). Maximum 65,535 rows. Low memory usage but cannot handle larger datasets.
XSSFWorkbook : Supports the .xlsx format (Excel 2007+). Up to 1,048,576 rows. Higher memory consumption because the entire workbook is kept in memory.
SXSSFWorkbook (POI 3.8+): Streaming writer for .xlsx. Writes data to temporary files to keep memory low, but random access, sheet cloning, formula evaluation and header modifications are not supported.
Choosing the appropriate workbook
If the row count ≤ 70,000, HSSFWorkbook (for .xls) or XSSFWorkbook (for .xlsx) can be used.
If rows > 70,000 and no complex styling or formulas are required, SXSSFWorkbook is preferred.
If rows > 70,000 and styling/formulas are needed, use XSSFWorkbook together with batch writes.
Exporting millions of rows with EasyExcel
Alibaba’s EasyExcel library (GitHub: https://github.com/alibaba/easyexcel) provides a low‑memory streaming API that avoids the drawbacks of native POI. The export strategy is:
Query the database in batches (e.g., 200,000 rows per query).
Write each batch to an Excel sheet; create a new sheet after every 1,000,000 rows.
Define the header once with Table and reuse the same ExcelWriter instance.
public void dataExport300w(HttpServletResponse response) throws IOException {
OutputStream out = response.getOutputStream();
ExcelWriter writer = new ExcelWriter(out, ExcelTypeEnum.XLSX);
Table table = new Table(1);
List<List<String>> head = new ArrayList<>();
head.add(Arrays.asList("onlineseqid"));
head.add(Arrays.asList("businessid"));
// ... other column names ...
table.setHead(head);
int total = actResultLogMapper.findActResultLogByCondations(3000001);
int rowsPerSheet = 1_000_000;
int rowsPerWrite = 200_000;
int sheetCount = (total + rowsPerSheet - 1) / rowsPerSheet;
int writesPerSheet = rowsPerSheet / rowsPerWrite;
int lastSheetWrites = (total % rowsPerSheet == 0) ? writesPerSheet
: (total % rowsPerSheet + rowsPerWrite - 1) / rowsPerWrite;
for (int s = 0; s < sheetCount; s++) {
Sheet sheet = new Sheet(s, 0);
sheet.setSheetName("Sheet" + s);
int writeTimes = (s == sheetCount - 1) ? lastSheetWrites : writesPerSheet;
for (int w = 0; w < writeTimes; w++) {
List<List<Object>> data = new ArrayList<>();
PageHelper.startPage(w + 1 + writesPerSheet * s, rowsPerWrite);
List<ActResultLog> batch = actResultLogMapper.findByPage100w();
for (ActResultLog r : batch) {
data.add(Arrays.asList(
r.getOnlineseqid(), r.getBusinessid(), r.getBecifno(),
r.getIvisresult(), r.getCreatedby(), new Date(),
r.getUpdateby(), new Date(), r.getRisklevel()));
}
writer.write0(data, sheet, table);
}
}
response.setHeader("Content-Disposition", "attachment;filename=excel300w.xlsx");
response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
writer.finish();
out.flush();
}Exporting 3 million rows (≈163 MB) completes in about 2 minutes 15 seconds when no cell styling is applied.
Importing millions of rows with EasyExcel
The import process reads the Excel file in batches and inserts the data into the database using JDBC batch statements inside a single transaction.
Implement an AnalysisEventListener that collects rows into a list.
When the list reaches the configured batch size (e.g., 100,000 rows), execute a JDBC batch insert and clear the list.
After the last row, flush any remaining data.
@Test
public void import2DBFromExcelTest() {
String file = "D:/excel300w.xlsx";
long start = System.currentTimeMillis();
EasyExcel.read(file, new EasyExceGeneralDatalListener(actResultLogService)).doReadAll();
long end = System.currentTimeMillis();
System.out.println("Read time: " + (end - start) + " ms");
}
public class EasyExceGeneralDatalListener extends AnalysisEventListener<Map<Integer, String>> {
private final ActResultLogService service;
private final List<Map<Integer, String>> buffer = new ArrayList<>();
private static final int BATCH = 100_000;
public EasyExceGeneralDatalListener(ActResultLogService service) { this.service = service; }
@Override
public void invoke(Map<Integer, String> data, AnalysisContext ctx) {
buffer.add(data);
if (buffer.size() >= BATCH) {
service.batchInsert(buffer);
buffer.clear();
}
}
@Override
public void doAfterAllAnalysed(AnalysisContext ctx) {
if (!buffer.isEmpty()) {
service.batchInsert(buffer);
buffer.clear();
}
}
}
public Map<String, String> batchInsert(List<Map<Integer, String>> rows) {
if (rows.isEmpty()) return Collections.singletonMap("empty", "0");
Connection conn = JDBCDruidUtils.getConnection();
try {
conn.setAutoCommit(false);
String sql = "INSERT INTO ACT_RESULT_LOG (onlineseqid,businessid,becifno,ivisresult,createdby,createddate,updateby,updateddate,risklevel) VALUES (?,?,?,?,?,?,?,?,?)";
try (PreparedStatement ps = conn.prepareStatement(sql)) {
for (Map<Integer, String> row : rows) {
ps.setString(1, row.get(0));
ps.setString(2, row.get(1));
ps.setString(3, row.get(2));
ps.setString(4, row.get(3));
ps.setString(5, row.get(4));
ps.setTimestamp(6, new Timestamp(System.currentTimeMillis()));
ps.setString(7, row.get(6));
ps.setTimestamp(8, new Timestamp(System.currentTimeMillis()));
ps.setString(9, row.get(8));
ps.addBatch();
}
ps.executeBatch();
conn.commit();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
JDBCDruidUtils.close(conn, null);
}
return Collections.singletonMap("success", "1");
}Reading 3 million rows takes ~83 seconds; the corresponding JDBC batch insert finishes in ~8 seconds.
Database schema (example)
CREATE TABLE ACT_RESULT_LOG (
onlineseqid VARCHAR2(32),
businessid VARCHAR2(32),
becifno VARCHAR2(32),
ivisresult VARCHAR2(32),
createdby VARCHAR2(32),
createddate DATE,
updateby VARCHAR2(32),
updateddate DATE,
risklevel VARCHAR2(32)
) TABLESPACE STUDY_KAY;Performance summary
Export 3 M rows → 2 min 15 s, file size ≈163 MB (no styling).
Import 3 M rows → total ~91 s (≈83 s read + 8 s batch insert).
Using EasyExcel for both export and import, combined with batch database queries and JDBC batch inserts, eliminates POI memory overflow, reduces I/O operations, and scales to millions of records while keeping execution time within practical limits.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
