How Zero‑Copy Can Speed Up Large File Splitting in Java
This article explains why a naïve BufferedReader/Writer approach to splitting large text files is inefficient, demonstrates a zero‑copy solution using FileChannel.transferTo with line‑preserving logic, and shows benchmark results that reveal dramatic performance gains.
Introduction
Zero‑copy techniques often feel abstract, but they can be applied directly in everyday Java development. This article demystifies zero‑copy for file splitting and provides a practical implementation.
Inefficient Example
The following code reads a large text file line by line, writes each line to a new file, and creates a new writer whenever the current chunk exceeds a size limit. Although functionally correct, it suffers from many allocations, repeated string‑to‑byte conversions, and multiple user‑kernel context switches.
private static final long maxFileSizeBytes = 10 * 1024 * 1024; // 10 MB
public void split(Path inputFile, Path outputDir) throws IOException {
if (!Files.exists(inputFile)) {
throw new IOException("Input file does not exist: " + inputFile);
}
if (Files.size(inputFile) == 0) {
throw new IOException("Input file is empty: " + inputFile);
}
Files.createDirectories(outputDir);
try (BufferedReader reader = Files.newBufferedReader(inputFile)) {
int fileIndex = 0;
long currentSize = 0;
BufferedWriter writer = null;
try {
writer = newWriter(outputDir, fileIndex++);
String line;
while ((line = reader.readLine()) != null) {
byte[] lineBytes = (line + System.lineSeparator()).getBytes();
if (currentSize + lineBytes.length > maxFileSizeBytes) {
if (writer != null) writer.close();
writer = newWriter(outputDir, fileIndex++);
currentSize = 0;
}
writer.write(line);
writer.newLine();
currentSize += lineBytes.length;
}
} finally {
if (writer != null) writer.close();
}
}
}
private BufferedWriter newWriter(Path dir, int index) throws IOException {
Path filePath = dir.resolve("part_" + index + ".txt");
return Files.newBufferedWriter(filePath);
}Efficiency Analysis
The code creates many heap objects (Strings, byte arrays) and copies data between kernel and user space repeatedly. Each call to BufferedReader ultimately invokes FileReader.read(), copying data from kernel buffers to user buffers, then converting it to a Java String. String.getBytes() creates a new byte array, adding further allocations. BufferedWriter writes data back to the kernel, causing another copy. This results in:
Excessive memory bandwidth usage due to buffer copying.
High CPU utilization for disk‑to‑disk transfers.
Lost opportunity for the OS to perform DMA‑based bulk copies.
High‑Performance Solution
To avoid these overheads, we use zero‑copy by employing FileChannel.transferTo, which transfers bytes directly from the source file channel to the target channel without crossing user space. The method works on byte blocks, so we must ensure that lines are not split. The algorithm finds the last newline before the maximum chunk size and adjusts the chunk boundary accordingly.
private static final int LINE_ENDING_SEARCH_WINDOW = 8 * 1024;
private long maxSizePerFileInBytes;
private Path outputDirectory;
private Path tempDir;
private void split(Path fileToSplit) throws IOException {
try (RandomAccessFile raf = new RandomAccessFile(fileToSplit.toFile(), "r");
FileChannel inputChannel = raf.getChannel()) {
long fileSize = raf.length();
long position = 0;
int fileCounter = 1;
while (position < fileSize) {
long targetEnd = Math.min(position + maxSizePerFileInBytes, fileSize);
long end = targetEnd;
if (end < fileSize) {
end = findLastLineEndBeforePosition(raf, position, targetEnd);
}
long chunkSize = end - position;
Path outPath = tempDir.resolve("_part" + fileCounter);
try (FileOutputStream fos = new FileOutputStream(outPath.toFile());
FileChannel outputChannel = fos.getChannel()) {
inputChannel.transferTo(position, chunkSize, outputChannel);
}
position = end;
fileCounter++;
}
}
}
private long findLastLineEndBeforePosition(RandomAccessFile raf, long start, long max) throws IOException {
long originalPos = raf.getFilePointer();
try {
int bufferSize = LINE_ENDING_SEARCH_WINDOW;
long chunk = max - start;
if (chunk < bufferSize) bufferSize = (int) chunk;
byte[] buffer = new byte[bufferSize];
long searchPos = max;
while (searchPos > start) {
long distance = searchPos - start;
int toRead = (int) Math.min(bufferSize, distance);
long readPos = searchPos - toRead;
raf.seek(readPos);
int read = raf.read(buffer, 0, toRead);
if (read <= 0) break;
for (int i = read - 1; i >= 0; i--) {
if (buffer[i] == '
') {
return readPos + i + 1;
}
}
searchPos -= read;
}
throw new IllegalArgumentException("File " + fileToSplit + " cannot be split. No newline found within the limits.");
} finally {
raf.seek(originalPos);
}
}This method works best on Unix‑like systems where lines end with \n. It may struggle with extremely long lines that exceed the maximum chunk size, but it is ideal for log files or datasets with short, numerous lines.
Performance Benchmark
Running both implementations on a 200 MB test file yields the following results:
Benchmark Mode Cnt Score Error Units
FileSplitterBenchmark.splitFile avgt 15 1179.429 ± 54.271 ms/op
FileSplitterBenchmark.splitFile:·gc.alloc.rate avgt 15 1349.613 ± 60.903 MB/sec
FileSplitterBenchmark.splitFileZeroCopy avgt 15 77.352 ± 1.339 ms/op
FileSplitterBenchmark.splitFileZeroCopy:·gc.alloc.rate avgt 15 23.759 ± 0.465 MB/secThe zero‑copy version is roughly 15× faster (77 ms vs 1179 ms) and generates far fewer garbage‑collection allocations.
Conclusion
Efficiently splitting large text files requires system‑level I/O considerations. By replacing the naïve BufferedReader/Writer approach with a zero‑copy FileChannel.transferTo implementation that preserves line boundaries, we achieve substantial speedups and lower memory pressure, demonstrating the practical impact of understanding I/O mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
