Why list.contains Is So Slow: Java Deduplication Performance Showdown
This article compares several Java duplicate‑removal techniques—including list.contains, HashSet, double‑loop removal, and Stream.distinct—by generating a 20 000‑element test list, measuring execution time, and explaining the underlying algorithmic complexities that make some approaches dramatically faster than others.
1. Test Data Generation
We first create a List<String> with 20 000 entries, where half of the elements are duplicates, using the getTestList() method.
public static List<String> getTestList() {
List<String> list = new ArrayList<>();
for (int i = 1; i <= 10000; i++) {
list.add(String.valueOf(i));
}
for (int i = 10000; i >= 1; i--) {
list.add(String.valueOf(i));
}
return list;
}2. Deduplication with list.contains
Implementation:
private static void useContain2Distinct(List<String> testList) {
System.out.println("contains 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = new ArrayList<>();
for (String str : testList) {
if (!testListDistinctResult.contains(str)) {
testListDistinctResult.add(str);
}
}
System.out.println("contains 去重完毕,条数:" + testListDistinctResult.size());
}Timing result (using StopWatch) shows a noticeable delay, leading to the evaluation: list.contains is inefficient; avoid it.
3. Deduplication with HashSet
Implementation:
private static void useSetDistinct(List<String> testList) {
System.out.println("HashSet.add 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = new ArrayList<>(new HashSet(testList));
System.out.println("HashSet.add 去重完毕,条数:" + testListDistinctResult.size());
}Timing shows a much shorter execution time, so the recommendation is: use HashSet for deduplication.
4. Why the Performance Gap?
The source of list.contains(o) ultimately calls indexOf(o), which scans the list linearly, giving O(n) complexity.
In contrast, HashSet.add(o) computes a hash and places the element directly, achieving O(1) average complexity.
Even HashSet.contains(o) operates in O(1) time.
5. Other Deduplication Methods
Double‑for‑loop removal (nested loops with list.remove(j)) is extremely slow and produces messy code.
private static void use2ForDistinct(List<String> testList) {
System.out.println("list 双循环 开始去重,条数:" + testList.size());
for (int i = 0; i < testList.size(); i++) {
for (int j = i + 1; j < testList.size(); j++) {
if (testList.get(i).equals(testList.get(j))) {
testList.remove(j);
}
}
}
System.out.println("list 双循环 去重完毕,条数:" + testList.size());
}Timing confirms it is the slowest approach.
Stream distinct offers concise syntax and decent performance.
private static void useStreamDistinct(List<String> testList) {
System.out.println("stream 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = testList.stream()
.distinct()
.collect(Collectors.toList());
System.out.println("stream 去重完毕,条数:" + testListDistinctResult.size());
}Its execution time is acceptable, making it a viable choice when code brevity matters.
6. Conclusion
For Java duplicate removal, HashSet (or Stream.distinct) provides the best performance, while list.contains and double‑loop approaches should be avoided due to their O(n) or worse complexities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
