Why List.contains Is So Slow for Java Deduplication – Benchmark Results
This article benchmarks several Java deduplication techniques—including List.contains, HashSet, double-loop removal, and Stream.distinct—using a 20,000‑element list, analyzes their execution times, explains underlying time complexities, and provides practical recommendations on which method to use.
Test Data Generation
A helper method creates a List<String> with 20,000 entries, where the first 10,000 are unique and the next 10,000 are duplicates in reverse order.
public static List<String> getTestList() {
List<String> list = new ArrayList<>();
for (int i = 1; i <= 10000; i++) {
list.add(String.valueOf(i));
}
for (int i = 10000; i >= 1; i--) {
list.add(String.valueOf(i));
}
return list;
}Deduplication Using List.contains
/**
* Use list.contains for deduplication
*/
private static void useContain2Distinct(List<String> testList) {
System.out.println("contains 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = new ArrayList<>();
for (String str : testList) {
if (!testListDistinctResult.contains(str)) {
testListDistinctResult.add(str);
}
}
System.out.println("contains 去重完毕,条数:" + testListDistinctResult.size());
}Timing wrapper:
public static void main(String[] args) {
List<String> testList = getTestList();
StopWatch stopWatch = new StopWatch();
stopWatch.start();
useContain2Distinct(testList);
stopWatch.stop();
System.out.println("去重 最终耗时" + stopWatch.getTotalTimeMillis());
}Result (milliseconds) is shown in the following screenshot:
Evaluation: The list.contains approach is extremely slow; avoid it for deduplication.
Deduplication Using HashSet
/**
* Use HashSet for deduplication
*/
private static void useSetDistinct(List<String> testList) {
System.out.println("HashSet.add 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = new ArrayList<>(new HashSet(testList));
System.out.println("HashSet.add 去重完毕,条数:" + testListDistinctResult.size());
}Timing wrapper (same as above) calls useSetDistinct.
Evaluation: HashSet is fast and recommended for deduplication.
Deduplication Using Double For‑Loop
/**
* Use double for‑loop for deduplication
*/
private static void use2ForDistinct(List<String> testList) {
System.out.println("list 双循环 开始去重,条数:" + testList.size());
for (int i = 0; i < testList.size(); i++) {
for (int j = i + 1; j < testList.size(); j++) {
if (testList.get(i).equals(testList.get(j))) {
testList.remove(j);
}
}
}
System.out.println("list 双循环 去重完毕,条数:" + testList.size());
}Timing wrapper calls use2ForDistinct.
Evaluation: This method is very slow, produces messy code, and should be avoided.
Deduplication Using Stream.distinct
/**
* Use Stream.distinct for deduplication
*/
private static void useStreamDistinct(List<String> testList) {
System.out.println("stream 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = testList.stream().distinct().collect(Collectors.toList());
System.out.println("stream 去重完毕,条数:" + testListDistinctResult.size());
}Timing wrapper calls useStreamDistinct.
Evaluation: Stream.distinct offers decent performance and concise code; it is a reasonable choice when readability matters.
Complexity Analysis
The source of list.contains ultimately calls indexOf, which scans the list linearly, giving O(n) time per lookup.
HashSet’s add hashes the element and places it directly into a bucket, resulting in O(1) average time.
Even HashSet’s contains operates in O(1) average time.
Therefore, the performance gap observed in the benchmarks is explained by the difference between linear (O(n)) and constant‑time (O(1)) operations.
Conclusion
For deduplicating large Java lists, avoid list.contains and manual double‑loop approaches. Prefer HashSet for the best speed, or use Stream.distinct when you value concise, functional‑style code and acceptable performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
