How to Export Millions of Excel Rows in Seconds: High‑Performance Java Strategies
Learn how to overcome memory and speed bottlenecks when exporting massive datasets to Excel by using streaming APIs like SXSSFWorkbook and EasyExcel, optimizing database pagination, tuning JVM and connection pools, and applying asynchronous shard processing to achieve stable sub‑200 MB memory usage for millions of rows.
Introduction
In daily work we often need to export data to Excel, but large volumes can cause performance and memory problems. This article presents high‑performance export solutions.
Problems with Traditional Approach
Many developers use Apache POI HSSF/XSSF. When rows exceed 50 000, performance drops sharply because each Cell object occupies about 1 KB, leading to JVM heap explosion.
Typical memory‑intensive code:
// Typical memory killer
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet();
for (int i = 0; i < 1000000; i++) {
Row row = sheet.createRow(i);
row.createCell(0).setCellValue("data"+i);
}This creates roughly 1 M Row objects and 10 M Cell objects, consuming >1 GB memory and triggering frequent Full GC, which may cause OOM.
Streaming Processing Architecture
The core is balancing memory and disk.
Solution 1: SXSSFWorkbook
SXSSFWorkbook is an enhanced POI class that keeps only a sliding window of rows in memory.
// Keep only 1000 rows in memory
SXSSFWorkbook workbook = new SXSSFWorkbook(1000);
Sheet sheet = workbook.createSheet();
for (int i = 0; i < 1000000; i++) {
Row row = sheet.createRow(i);
if (i % 1000 == 0) {
((SXSSFSheet)sheet).flushRows(1000);
}
}With a 1000‑row window, memory stays under 200 MB for a million rows.
Solution 2: EasyExcel (FastExcel)
EasyExcel, an Alibaba open‑source framework, provides a fast streaming API.
String fileName = "data.xlsx";
EasyExcel.write(fileName, DataModel.class)
.registerWriteHandler(new LongestMatchColumnWidthStyleStrategy())
.sheet("Sheet1")
.doWrite(data -> {
int page = 0;
while (true) {
List<DataModel> list = queryByPage(page, 5000);
if (CollectionUtils.isEmpty(list)) break;
data.write(list);
page++;
}
});It uses an event‑driven model and object‑reuse pool, keeping memory below 50 MB for millions of rows.
Automatic batch loading (default 5000 rows per batch)
Reflection cache eliminates repeated object creation
Built‑in style optimization avoids memory fragmentation
Database Query Golden Rules
Even with an optimized exporter, slow queries limit overall performance. Three key optimizations:
3.1 Solve Deep Pagination
Traditional LIMIT offset becomes slower with large offsets.
SELECT * FROM table LIMIT 900000, 1000; -- slows downCorrect approach: use cursor based on incremental ID.
Long lastId = 0L;
int pageSize = 5000;
do {
List<Data> list = jdbcTemplate.query(
"SELECT * FROM table WHERE id > ? ORDER BY id LIMIT ?",
new BeanPropertyRowMapper<>(Data.class),
lastId, pageSize);
if (list.isEmpty()) break;
// process data...
lastId = list.get(list.size() - 1).getId();
} while (true);3.2 Reduce Field Count
Query only required columns instead of SELECT *.
-- Bad
SELECT * FROM big_table;
-- Good
SELECT id, name, create_time FROM big_table;Reducing fields from 20 to 5 cuts query time by 40 % and network traffic by 70 %.
3.3 Connection Pool Tuning
Use an independent connection pool for export tasks.
spring:
datasource:
hikari:
maximum-pool-size: 20
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000Thread count formula: threads = CPU cores × 2 + disk count.
Production‑Level Advanced Techniques
4.1 Asynchronous Sharding Export
Split the export into shards and process them in parallel.
@Async("exportExecutor")
public CompletableFuture<String> asyncExport(ExportParam param) {
int total = dataService.count(param);
int shardSize = total / 100000;
List<CompletableFuture<Void>> futures = new ArrayList<>();
for (int i = 0; i < shardSize; i++) {
int finalI = i;
futures.add(CompletableFuture.runAsync(() ->
exportShard(param, finalI * 100000, 100000), forkJoinPool.commonPool()));
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
.thenApply(v -> mergeFiles(shardSize));
return CompletableFuture.completedFuture(taskId);
}Divide‑and‑conquer reduces execution time while keeping resource usage controllable.
4.2 JVM Parameter Configuration
-Xmx4g -Xms4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=4
-XX:ConcGCThreads=2
-XX:InitiatingHeapOccupancyPercent=35Proper GC and heap settings improve stability.
Young/old generation ratio ≈ 2:1
Avoid objects >50 KB
Reuse DTO instances via object pool
4.3 End‑to‑End Solution Diagram
Overall architecture:
User clicks export → task ID created → asynchronous worker pages data → writes to Excel (multi‑thread) → stores file in cloud storage → updates task status → notifies user via WebSocket.
Conclusion
After extensive practice, the formula for high‑performance Excel export is:
High performance = Streaming engine + Paginated query optimization + Resource control
Key take‑aways: avoid premature optimization, monitor export latency and memory, and always provide a CSV fallback.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
