Why removeAll Slows Down Large Lists and How a Map-Based Solution Speeds Up

This article compares two Java lists using removeAll and a Map‑based approach, shows performance tests that reveal removeAll becomes prohibitively slow for large collections, explains the underlying algorithmic cost, and demonstrates how counting with a Map dramatically improves speed.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Why removeAll Slows Down Large Lists and How a Map-Based Solution Speeds Up

Consider two list objects listA and listB and the need to find elements that appear only in each list, producing onlyListA and onlyListB.

removeAll implementation

The straightforward way uses List.removeAll to compute the differences.

List<String> listA = new ArrayList<>();
List<String> listB = new ArrayList<>();
// only in A
List<String> onlyListA = new ArrayList<>(listA);
onlyListA.removeAll(listB);
// only in B
List<String> onlyListB = new ArrayList<>(listB);
onlyListB.removeAll(listA);

Performance tests for element counts of 1 000, 10 000, 100 000 and 1 000 000 show that removeAll scales poorly, taking up to several million milliseconds for a million‑element list.

Map‑based optimization

A more efficient method counts occurrences with a Map and separates unique elements.

public static <E> Tuple2<List<E>, List<E>> getDiffListBtMapCompare(List<E> listA, List<E> listB) {
    List<E> onlyAList = new ArrayList<>();
    List<E> onlyBList = new ArrayList<>();
    Map<E, Integer> countMap = new HashMap<>(Math.max(listA.size(), listB.size()));
    for (E e : listA) {
        countMap.put(e, 1);
    }
    for (E e : listB) {
        countMap.put(e, 1 + countMap.getOrDefault(e, -2));
    }
    countMap.forEach((k, v) -> {
        if (v == 1) {
            onlyAList.add(k);
        } else if (v == -1) {
            onlyBList.add(k);
        }
    });
    return Tuple.of(onlyAList, onlyBList);
}

Testing the same sizes shows dramatically lower execution times (e.g., 8 ms for 1 000 elements, 96 ms for 1 000 000, and 5 320 ms for 10 000 000).

Analysis of removeAll cost

removeAll

delegates to batchRemove, which iterates over the source collection and calls contains() for each element. contains() is implemented via indexOf(), which itself performs a linear scan of the internal array.

Consequently, for collections of sizes m and n , the algorithm executes roughly m × n / 2 comparisons, leading to billions of operations for ten‑million‑size lists.

Conclusion: For large lists, avoid removeAll and prefer a Map‑based diff algorithm to achieve orders‑of‑magnitude better performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CollectionsMAPListremoveAll
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.