Avoid OOM: Stream Massive Excel Files with Minimal Memory Using Excel‑Streaming‑Reader

This article explains how to prevent out‑of‑memory errors when importing huge Excel files in Java by using the open‑source excel‑streaming‑reader library, showing Maven setup, streaming code examples, memory‑saving configurations for shared strings and comments, and tips for handling extremely large workbooks.

Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Avoid OOM: Stream Massive Excel Files with Minimal Memory Using Excel‑Streaming‑Reader

Environment: Spring Boot 3.5.0.

In many business applications, importing Excel files that contain hundreds of thousands or even millions of rows is a frequent requirement. The traditional Apache POI approach loads the entire workbook into memory, which easily leads to Out‑Of‑Memory (OOM) errors, application pauses, or crashes. To address this problem, the article introduces the open‑source Excel Streaming Reader component, which reads Excel files in a streaming fashion, keeping memory consumption extremely low while preserving read performance.

Excel Streaming Reader is a fork of monitorjbl/excel-streaming-reader. It supports Apache POI 5.x and requires Java 8 or newer; version 2.3.x also works with POI 4.x.

Dependency (Maven):

<dependency>
  <groupId>com.github.pjfanning</groupId>
  <artifactId>excel-streaming-reader</artifactId>
  <version>5.2.0</version>
</dependency>

2.1 Read 500 k rows

The library can be used with only a few lines of code. The following example reads an Excel file containing 500 000 rows, prints each cell value, and sleeps 500 ms between rows to make the memory usage observable.

try (InputStream is = new FileInputStream(new File("d:/users.xlsx"))) {
    Workbook workbook = StreamingReader.builder()
        .rowCacheSize(100)               // rows kept in memory (default 10)
        .bufferSize(4096)                // buffer size in bytes (default 1024)
        .open(is);
    Sheet sheet = workbook.getSheetAt(0);
    for (Row r : sheet) {
        if (r.getRowNum() == 0) continue; // skip header
        for (Cell c : r) {
            System.err.print(c.getStringCellValue() + "\t");
        }
        System.err.println();
        TimeUnit.MILLISECONDS.sleep(500);
    }
} catch (Exception e) {
    e.printStackTrace();
}

Running the program shows a very low and stable memory footprint (see the two screenshots below).

Because the implementation is streaming, only a limited number of rows (controlled by rowCacheSize) are kept in memory at any time. Random access within a cached row is possible, but random access across rows is not.

2.2 Temporary‑file backed shared strings

By default, the shared‑string table ( /xl/sharedStrings.xml) is loaded fully into memory, which can also cause OOM. The library allows storing this table in a temporary file:

Workbook workbook = StreamingReader.builder()
    .setSharedStringsImplementationType(SharedStringsImplementationType.TEMP_FILE_BACKED)
    .setEncryptSstTempFile(false) // optional encryption, may affect speed
    .setFullFormatRichText(true)   // keep rich‑text format when using temp files
    .open(is);

There is also a map‑backed implementation that avoids temporary files while being more efficient than POI’s default.

2.3 Handling extremely large workbooks

When dealing with very large files, avoid calling setAvoidTempFiles(true). Instead, adjust POI’s internal thresholds to encourage temporary‑file usage:

import org.apache.poi.openxml4j.util.*;
import org.apache.poi.openxml4j.opc.*;

ZipInputStreamZipEntrySource.setThresholdBytesForTempFiles(16384); // 16 KB
ZipPackage.setUseTempFilePackageParts(true);

2.4 Reading comments

Comments are stored in separate parts of the XLSX package and are not read by default. They can be enabled and stored either in memory or in a temporary file:

Workbook workbook = StreamingReader.builder()
    .setReadComments(true)
    .setCommentsImplementationType(CommentsImplementationType.TEMP_FILE_BACKED)
    .setEncryptCommentsTempFile(false)
    .setFullFormatRichText(true)
    .open(is);

Both the shared‑string and comment examples require the additional Maven dependency:

<dependency>
  <groupId>com.github.pjfanning</groupId>
  <artifactId>poi-shared-strings</artifactId>
  <version>2.10.0</version>
</dependency>

By following the steps above, developers can reliably import massive Excel files in Spring Boot applications without exhausting JVM memory.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaMemory Optimizationspring-bootExcelApache POIStreamingReader
Spring Full-Stack Practical Cases
Written by

Spring Full-Stack Practical Cases

Full-stack Java development with Vue 2/3 front-end suite; hands-on examples and source code analysis for Spring, Spring Boot 2/3, and Spring Cloud.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.