How to Process Massive Log Files in Java Without Running Out of Memory
This article explains how to handle log files that are too large to fit into memory by reading them line‑by‑line with Java streams, using a Counter class and BitSet to track service usage and efficiently generate a top‑10 report.
Reading file content in Java using Files works for small files, but large files exceeding memory require a different strategy: processing them in chunks.
Scenario
Suppose we need to analyze daily server log files and produce a report of the top 10 most used services, where a service must appear in every day's log to qualify.
2024-02-25T00:00:00.000+GMT host7 492 products 0.0.3 PUT 73.182.150.152 ...
2024-02-25T00:00:00.016+GMT host6 123 logout 2.0.3 GET 34.235.76.94 ...
2024-02-25T00:00:00.033+GMT host6 50 payments/:id 0.4.6 PUT 148.241.146.59 ...
2024-02-25T00:00:00.050+GMT host2 547 orders 1.5.0 PUT 6.232.116.248 ...
2024-02-25T00:00:00.067+GMT host4 400 suggestions 0.8.6 DELETE 149.138.227.154 ...
2024-02-25T00:00:00.084+GMT host2 644 login 6.90 GET 208.158.145.204 ...
2024-02-25T00:00:00.101+GMT host5 339 suggestions 0.8.9 PUT 173.109.21.97 ...
2024-02-25T00:00:00.118+GMT host9 87 products 2.6.3 POST 220.252.90.140 ...
2024-02-25T00:00:00.134+GMT host0 845 products 9.4.6 GET 136.79.178.188 ...
2024-02-25T00:00:00.151+GMT host4 675 login 0.89 DELETE 32.159.65.239 ...The initial implementation loads all files into memory, builds maps of dates to log lines, extracts service names, computes statistics, and selects the top ten. This approach can cause OutOfMemoryError.
public void processFiles(final List<File> fileList) {
final Map<LocalDate, List<LogLine>> fileContent = getFileContent(fileList);
final List<String> serviceList = getServiceList(fileContent);
final List<Statistics> statisticsList = getStatistics(fileContent, serviceList);
final List<Statistics> topCalls = getTop10(statisticsList);
print(topCalls);
}To avoid loading everything at once, we switch to a line‑by‑line processing model.
private void processFiles(final List<File> fileList) {
final Map<String, Counter> compiledMap = new HashMap<>();
for (int i = 0; i < fileList.size(); i++) {
processFile(fileList, compiledMap, i);
}
final List<Counter> topCalls = compiledMap.values().stream()
.filter(Counter::allDaysSet)
.sorted(Comparator.comparing(Counter::getNumberOfCalls).reversed())
.limit(10)
.toList();
print(topCalls);
}The Counter class stores the service name, the number of calls, and a BitSet indicating on which days the service was called.
public class Counter {
@Getter private String serviceName;
@Getter private long numberOfCalls;
private final BitSet daysWithCalls;
public Counter(final String serviceName, final int numberOfDays) {
this.serviceName = serviceName;
this.numberOfCalls = 0L;
daysWithCalls = new BitSet(numberOfDays);
}
public void add() { numberOfCalls++; }
public void setDay(final int dayNumber) { daysWithCalls.set(dayNumber); }
public boolean allDaysSet() {
return daysWithCalls.stream()
.mapToObj(index -> daysWithCalls.get(index))
.reduce(Boolean.TRUE, Boolean::logicalAnd);
}
}The processFile method reads a file lazily with Files.lines, converts each line to a LogLine, updates or creates a Counter for the service, increments the call count, and marks the day.
private void processFile(final List<File> fileList,
final Map<String, Counter> compiledMap,
final int dayNumber) {
try (Stream<String> lineStream = Files.lines(fileList.get(dayNumber).toPath())) {
lineStream.map(this::toLogLine).forEach(logLine -> {
Counter counter = compiledMap.get(logLine.serviceName());
if (counter == null) {
counter = new Counter(logLine.serviceName(), fileList.size());
compiledMap.put(logLine.serviceName(), counter);
}
counter.add();
counter.setDay(dayNumber);
});
} catch (final IOException e) {
throw new RuntimeException(e);
}
}Using Java’s lazy stream API and a compact BitSet allows processing arbitrarily large log files without exhausting memory.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
