Backend Development 12 min read

Efficient Large File Reading in Java: Memory‑Friendly Approaches and Concurrency

This article explains how to read large files in Java without running out of memory by comparing full‑file loading, line‑by‑line reading using BufferedReader, Apache Commons IO, and Java 8 streams, and shows how to boost throughput with batch processing and multithreaded file splitting.

Full-Stack Internet Architecture

Jul 1, 2020

Efficient Large File Reading in Java: Memory‑Friendly Approaches and Concurrency

When a Java application needs to read data from a file and store it into a database, loading the entire file into memory works for small files but quickly leads to Out‑Of‑Memory (OOM) errors for large files.

Memory Reading

The initial implementation reads all lines into a List<String> using Apache Commons IO's FileUtils.readLines, then processes each line. This approach can consume far more memory than the file size, causing OOM for files like a 740 MB test file with 2 million lines.

Stopwatch stopwatch = Stopwatch.createStarted();
// 将全部行数读取的内存中
List<String> lines = FileUtils.readLines(new File("temp/test.txt"), Charset.defaultCharset());
for (String line : lines) {
    // pass
}
stopwatch.stop();
System.out.println("read all lines spend " + stopwatch.elapsed(TimeUnit.SECONDS) + " s");
logMemory();

The memory‑logging method uses MemoryMXBean to print heap usage:

MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();
MemoryUsage memoryUsage = memoryMXBean.getHeapMemoryUsage();
long totalMemorySize = memoryUsage.getInit();
long usedMemorySize = memoryUsage.getUsed();
System.out.println("Total Memory: " + totalMemorySize / (1024 * 1024) + " Mb");
System.out.println("Free Memory: " + usedMemorySize / (1024 * 1024) + " Mb");

Line‑by‑Line Reading

To avoid OOM, the article introduces three line‑by‑line techniques.

BufferedReader

try (BufferedReader fileBufferReader = new BufferedReader(new FileReader("temp/test.txt"))) {
    String fileLineContent;
    while ((fileLineContent = fileBufferReader.readLine()) != null) {
        // process the line.
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

Apache Commons IO

Stopwatch stopwatch = Stopwatch.createStarted();
LineIterator fileContents = FileUtils.lineIterator(new File("temp/test.txt"), StandardCharsets.UTF_8.name());
while (fileContents.hasNext()) {
    fileContents.nextLine();
    // pass
}
logMemory();
fileContents.close();
stopwatch.stop();
System.out.println("read all lines spend " + stopwatch.elapsed(TimeUnit.SECONDS) + " s");

Java 8 Stream

Stopwatch stopwatch = Stopwatch.createStarted();
try (Stream<String> inputStream = Files.lines(Paths.get("temp/test.txt"), StandardCharsets.UTF_8)) {
    inputStream
        .filter(str -> str.length() > 5) // filter data
        .forEach(o -> {
            // pass do sample logic
        });
}
logMemory();
stopwatch.stop();
System.out.println("read all lines spend " + stopwatch.elapsed(TimeUnit.SECONDS) + " s");

Concurrent Reading

Processing lines sequentially can be slow for massive files, so two parallel strategies are presented.

Batch Packaging with ThreadPool

@SneakyThrows
public static void readInApacheIOWithThreadPool() {
    ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(10, 10, 60L, TimeUnit.SECONDS, new LinkedBlockingDeque<>(100));
    LineIterator fileContents = FileUtils.lineIterator(new File("temp/test.txt"), StandardCharsets.UTF_8.name());
    List<String> lines = Lists.newArrayList();
    while (fileContents.hasNext()) {
        String nextLine = fileContents.nextLine();
        lines.add(nextLine);
        if (lines.size() == 100000) {
            List<List<String>> partition = Lists.partition(lines, 50000);
            List<Future> futureList = Lists.newArrayList();
            for (List<String> strings : partition) {
                Future<?> future = threadPoolExecutor.submit(() -> {
                    processTask(strings);
                });
                futureList.add(future);
            }
            for (Future future : futureList) {
                future.get();
            }
            lines.clear();
        }
    }
    if (!lines.isEmpty()) {
        processTask(lines);
    }
    threadPoolExecutor.shutdown();
}
private static void processTask(List<String> strings) {
    for (String line : strings) {
        try { TimeUnit.MILLISECONDS.sleep(10L); } catch (InterruptedException e) { e.printStackTrace(); }
    }
}

Splitting Large File into Small Files

public static void splitFileAndRead() throws Exception {
    List<File> fileList = splitLargeFile("temp/test.txt");
    ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(10, 10, 60L, TimeUnit.SECONDS, new LinkedBlockingDeque<>(100));
    List<Future> futureList = Lists.newArrayList();
    for (File file : fileList) {
        Future<?> future = threadPoolExecutor.submit(() -> {
            try (Stream inputStream = Files.lines(file.toPath(), StandardCharsets.UTF_8)) {
                inputStream.forEach(o -> {
                    try { TimeUnit.MILLISECONDS.sleep(10L); } catch (InterruptedException e) { e.printStackTrace(); }
                });
            } catch (IOException e) { e.printStackTrace(); }
        });
        futureList.add(future);
    }
    for (Future future : futureList) { future.get(); }
    threadPoolExecutor.shutdown();
}
private static List<File> splitLargeFile(String largeFileName) throws IOException {
    LineIterator fileContents = FileUtils.lineIterator(new File(largeFileName), StandardCharsets.UTF_8.name());
    List<String> lines = Lists.newArrayList();
    int num = 1;
    List<File> files = Lists.newArrayList();
    while (fileContents.hasNext()) {
        String nextLine = fileContents.nextLine();
        lines.add(nextLine);
        if (lines.size() == 100000) {
            createSmallFile(lines, num, files);
            num++;
        }
    }
    if (!lines.isEmpty()) { createSmallFile(lines, num, files); }
    return files;
}

Alternatively, a simple shell command can split the file: split -l 100000 test.txt.

Conclusion

For modest‑size files, loading the entire file into memory is acceptable and fast. For large files, line‑by‑line reading prevents OOM, and combining it with multithreading—either by batching lines or by splitting the file—significantly improves processing speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Memory management concurrency File I/O large files

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Memory Reading

Line‑by‑Line Reading

BufferedReader

Apache Commons IO

Java 8 Stream

Concurrent Reading

Batch Packaging with ThreadPool

Splitting Large File into Small Files

Conclusion

Full-Stack Internet Architecture

How this landed with the community

Was this worth your time?

0 Comments

Java 8 Stream