Backend Development 7 min read

Why list.contains Is So Slow: Java Deduplication Performance Showdown

This article compares several Java duplicate‑removal techniques—including list.contains, HashSet, double‑loop removal, and Stream.distinct—by generating a 20 000‑element test list, measuring execution time, and explaining the underlying algorithmic complexities that make some approaches dramatically faster than others.

Programmer DD

Jun 13, 2023

Why list.contains Is So Slow: Java Deduplication Performance Showdown

1. Test Data Generation

We first create a List<String> with 20 000 entries, where half of the elements are duplicates, using the getTestList() method.

public static List<String> getTestList() {
    List<String> list = new ArrayList<>();
    for (int i = 1; i <= 10000; i++) {
        list.add(String.valueOf(i));
    }
    for (int i = 10000; i >= 1; i--) {
        list.add(String.valueOf(i));
    }
    return list;
}

2. Deduplication with list.contains

Implementation:

private static void useContain2Distinct(List<String> testList) {
    System.out.println("contains 开始去重，条数：" + testList.size());
    List<String> testListDistinctResult = new ArrayList<>();
    for (String str : testList) {
        if (!testListDistinctResult.contains(str)) {
            testListDistinctResult.add(str);
        }
    }
    System.out.println("contains 去重完毕，条数：" + testListDistinctResult.size());
}

Timing result (using StopWatch) shows a noticeable delay, leading to the evaluation: list.contains is inefficient; avoid it.

3. Deduplication with HashSet

Implementation:

private static void useSetDistinct(List<String> testList) {
    System.out.println("HashSet.add 开始去重，条数：" + testList.size());
    List<String> testListDistinctResult = new ArrayList<>(new HashSet(testList));
    System.out.println("HashSet.add 去重完毕，条数：" + testListDistinctResult.size());
}

Timing shows a much shorter execution time, so the recommendation is: use HashSet for deduplication.

4. Why the Performance Gap?

The source of list.contains(o) ultimately calls indexOf(o), which scans the list linearly, giving O(n) complexity.

In contrast, HashSet.add(o) computes a hash and places the element directly, achieving O(1) average complexity.

Even HashSet.contains(o) operates in O(1) time.

5. Other Deduplication Methods

Double‑for‑loop removal (nested loops with list.remove(j)) is extremely slow and produces messy code.

private static void use2ForDistinct(List<String> testList) {
    System.out.println("list 双循环 开始去重，条数：" + testList.size());
    for (int i = 0; i < testList.size(); i++) {
        for (int j = i + 1; j < testList.size(); j++) {
            if (testList.get(i).equals(testList.get(j))) {
                testList.remove(j);
            }
        }
    }
    System.out.println("list 双循环  去重完毕，条数：" + testList.size());
}

Timing confirms it is the slowest approach.

Stream distinct offers concise syntax and decent performance.

private static void useStreamDistinct(List<String> testList) {
    System.out.println("stream 开始去重，条数：" + testList.size());
    List<String> testListDistinctResult = testList.stream()
        .distinct()
        .collect(Collectors.toList());
    System.out.println("stream 去重完毕，条数：" + testListDistinctResult.size());
}

Its execution time is acceptable, making it a viable choice when code brevity matters.

6. Conclusion

For Java duplicate removal, HashSet (or Stream.distinct) provides the best performance, while list.contains and double‑loop approaches should be avoided due to their O(n) or worse complexities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Deduplication List Stream hashset

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.