How to Export Millions of Excel Rows in Seconds: High‑Performance Java Strategies

Learn how to overcome memory and speed bottlenecks when exporting massive datasets to Excel by using streaming APIs like SXSSFWorkbook and EasyExcel, optimizing database pagination, tuning JVM and connection pools, and applying asynchronous shard processing to achieve stable sub‑200 MB memory usage for millions of rows.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Export Millions of Excel Rows in Seconds: High‑Performance Java Strategies

Introduction

In daily work we often need to export data to Excel, but large volumes can cause performance and memory problems. This article presents high‑performance export solutions.

Problems with Traditional Approach

Many developers use Apache POI HSSF/XSSF. When rows exceed 50 000, performance drops sharply because each Cell object occupies about 1 KB, leading to JVM heap explosion.

Typical memory‑intensive code:

// Typical memory killer
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet();
for (int i = 0; i < 1000000; i++) {
    Row row = sheet.createRow(i);
    row.createCell(0).setCellValue("data"+i);
}

This creates roughly 1 M Row objects and 10 M Cell objects, consuming >1 GB memory and triggering frequent Full GC, which may cause OOM.

Streaming Processing Architecture

The core is balancing memory and disk.

Solution 1: SXSSFWorkbook

SXSSFWorkbook is an enhanced POI class that keeps only a sliding window of rows in memory.

// Keep only 1000 rows in memory
SXSSFWorkbook workbook = new SXSSFWorkbook(1000);
Sheet sheet = workbook.createSheet();
for (int i = 0; i < 1000000; i++) {
    Row row = sheet.createRow(i);
    if (i % 1000 == 0) {
        ((SXSSFSheet)sheet).flushRows(1000);
    }
}

With a 1000‑row window, memory stays under 200 MB for a million rows.

Solution 2: EasyExcel (FastExcel)

EasyExcel, an Alibaba open‑source framework, provides a fast streaming API.

String fileName = "data.xlsx";
EasyExcel.write(fileName, DataModel.class)
    .registerWriteHandler(new LongestMatchColumnWidthStyleStrategy())
    .sheet("Sheet1")
    .doWrite(data -> {
        int page = 0;
        while (true) {
            List<DataModel> list = queryByPage(page, 5000);
            if (CollectionUtils.isEmpty(list)) break;
            data.write(list);
            page++;
        }
    });

It uses an event‑driven model and object‑reuse pool, keeping memory below 50 MB for millions of rows.

Automatic batch loading (default 5000 rows per batch)

Reflection cache eliminates repeated object creation

Built‑in style optimization avoids memory fragmentation

Database Query Golden Rules

Even with an optimized exporter, slow queries limit overall performance. Three key optimizations:

3.1 Solve Deep Pagination

Traditional LIMIT offset becomes slower with large offsets.

SELECT * FROM table LIMIT 900000, 1000; -- slows down

Correct approach: use cursor based on incremental ID.

Long lastId = 0L;
int pageSize = 5000;
do {
    List<Data> list = jdbcTemplate.query(
        "SELECT * FROM table WHERE id > ? ORDER BY id LIMIT ?",
        new BeanPropertyRowMapper<>(Data.class),
        lastId, pageSize);
    if (list.isEmpty()) break;
    // process data...
    lastId = list.get(list.size() - 1).getId();
} while (true);

3.2 Reduce Field Count

Query only required columns instead of SELECT *.

-- Bad
SELECT * FROM big_table;
-- Good
SELECT id, name, create_time FROM big_table;

Reducing fields from 20 to 5 cuts query time by 40 % and network traffic by 70 %.

3.3 Connection Pool Tuning

Use an independent connection pool for export tasks.

spring:
  datasource:
    hikari:
      maximum-pool-size: 20
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000

Thread count formula: threads = CPU cores × 2 + disk count.

Production‑Level Advanced Techniques

4.1 Asynchronous Sharding Export

Split the export into shards and process them in parallel.

@Async("exportExecutor")
public CompletableFuture<String> asyncExport(ExportParam param) {
    int total = dataService.count(param);
    int shardSize = total / 100000;
    List<CompletableFuture<Void>> futures = new ArrayList<>();
    for (int i = 0; i < shardSize; i++) {
        int finalI = i;
        futures.add(CompletableFuture.runAsync(() ->
            exportShard(param, finalI * 100000, 100000), forkJoinPool.commonPool()));
    }
    CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
        .thenApply(v -> mergeFiles(shardSize));
    return CompletableFuture.completedFuture(taskId);
}

Divide‑and‑conquer reduces execution time while keeping resource usage controllable.

4.2 JVM Parameter Configuration

-Xmx4g -Xms4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=4
-XX:ConcGCThreads=2
-XX:InitiatingHeapOccupancyPercent=35

Proper GC and heap settings improve stability.

Young/old generation ratio ≈ 2:1

Avoid objects >50 KB

Reuse DTO instances via object pool

4.3 End‑to‑End Solution Diagram

Overall architecture:

Architecture diagram
Architecture diagram

User clicks export → task ID created → asynchronous worker pages data → writes to Excel (multi‑thread) → stores file in cloud storage → updates task status → notifies user via WebSocket.

Conclusion

After extensive practice, the formula for high‑performance Excel export is:

High performance = Streaming engine + Paginated query optimization + Resource control

Key take‑aways: avoid premature optimization, monitor export latency and memory, and always provide a CSV fallback.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaStreamingjvm-tuningasynchronous processingexcel-exportDatabase pagination
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.