Why removeAll Slows Down Large Lists and How a Map-Based Solution Speeds Up
This article compares two Java lists using removeAll and a Map‑based approach, shows performance tests that reveal removeAll becomes prohibitively slow for large collections, explains the underlying algorithmic cost, and demonstrates how counting with a Map dramatically improves speed.
Consider two list objects listA and listB and the need to find elements that appear only in each list, producing onlyListA and onlyListB.
removeAll implementation
The straightforward way uses List.removeAll to compute the differences.
List<String> listA = new ArrayList<>();
List<String> listB = new ArrayList<>();
// only in A
List<String> onlyListA = new ArrayList<>(listA);
onlyListA.removeAll(listB);
// only in B
List<String> onlyListB = new ArrayList<>(listB);
onlyListB.removeAll(listA);Performance tests for element counts of 1 000, 10 000, 100 000 and 1 000 000 show that removeAll scales poorly, taking up to several million milliseconds for a million‑element list.
Map‑based optimization
A more efficient method counts occurrences with a Map and separates unique elements.
public static <E> Tuple2<List<E>, List<E>> getDiffListBtMapCompare(List<E> listA, List<E> listB) {
List<E> onlyAList = new ArrayList<>();
List<E> onlyBList = new ArrayList<>();
Map<E, Integer> countMap = new HashMap<>(Math.max(listA.size(), listB.size()));
for (E e : listA) {
countMap.put(e, 1);
}
for (E e : listB) {
countMap.put(e, 1 + countMap.getOrDefault(e, -2));
}
countMap.forEach((k, v) -> {
if (v == 1) {
onlyAList.add(k);
} else if (v == -1) {
onlyBList.add(k);
}
});
return Tuple.of(onlyAList, onlyBList);
}Testing the same sizes shows dramatically lower execution times (e.g., 8 ms for 1 000 elements, 96 ms for 1 000 000, and 5 320 ms for 10 000 000).
Analysis of removeAll cost
removeAlldelegates to batchRemove, which iterates over the source collection and calls contains() for each element. contains() is implemented via indexOf(), which itself performs a linear scan of the internal array.
Consequently, for collections of sizes m and n , the algorithm executes roughly m × n / 2 comparisons, leading to billions of operations for ten‑million‑size lists.
Conclusion: For large lists, avoid removeAll and prefer a Map‑based diff algorithm to achieve orders‑of‑magnitude better performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
