Fundamentals 30 min read

Overview of Java Collection Framework and Core Concepts

This article provides a comprehensive overview of Java collections, covering the definition, characteristics, differences from arrays, advantages, common collection classes, underlying data structures, fail‑fast mechanism, detailed List, Set, Queue, and Map interfaces, as well as HashMap and ConcurrentHashMap implementations and best practices.

Full-Stack Internet Architecture

Sep 25, 2021

Overview of Java Collection Framework and Core Concepts

Collection Container Overview

What is a Collection?

A collection is a container used to store data.

The collection framework defines a unified architecture for representing and operating on collections, consisting of three major parts: external interfaces, interface implementations, and algorithms for collection operations.

Characteristics of Collections

Encapsulate data; multiple objects often need a collection for storage.

Collections have variable length, unlike fixed‑size arrays.

Difference Between Collections and Arrays

Arrays have fixed length; collections are dynamically sized.

Arrays can store primitive types and reference types; collections store only reference types.

All elements in an array must be of the same type; a collection can hold objects of different types.

Each collection type has its own internal data structure, forming a hierarchy of collection systems.

Advantages of the Collection Framework

Automatic capacity growth.

High‑performance data structures and algorithms.

Easy extensibility and code reuse.

Using JDK‑provided collection classes reduces learning cost and maintenance effort.

Common Collection Classes

Java containers are divided into Collection and Map . The Collection sub‑interfaces include Set , List , and Queue . The most frequently used are Set and List; Map is not a sub‑interface of Collection.

List : ordered container, allows duplicates and nulls, implementations include ArrayList, LinkedList, Vector.

Set : unordered container, no duplicates, allows at most one null, implementations include HashSet, LinkedHashSet, TreeSet.

Map : key‑value pair container, keys are unique, values may duplicate; common implementations are HashMap, TreeMap, Hashtable, LinkedHashMap, ConcurrentHashMap.

Underlying Data Structures

Collection

List

ArrayList – backed by an Object array.

Vector – backed by an Object array.

LinkedList – doubly‑linked circular list.

Set

HashSet – based on HashMap, stores elements as keys with a constant dummy value.

LinkedHashSet – extends HashSet, maintains insertion order via a linked list.

TreeSet – ordered, unique elements stored in a red‑black tree.

Map

HashMap : before JDK 1.8, an array + linked list; from JDK 1.8, array + linked list + red‑black tree when a bucket exceeds a threshold.

LinkedHashMap : extends HashMap, adds a doubly‑linked list to preserve insertion or access order.

Hashtable : array + linked list, synchronized.

TreeMap : red‑black tree implementation.

Fail‑Fast Mechanism

When multiple threads modify a collection structurally during iteration, the iterator detects a change in the internal modCount and throws ConcurrentModificationException. Solutions include synchronizing modifications or using CopyOnWriteArrayList.

List Interface

What is an Iterator?

The Iterator interface provides a way to traverse any Collection, allowing element removal during iteration.

List<String> list = new ArrayList<>();
Iterator<String> it = list.iterator();
while (it.hasNext()) {
    String obj = it.next();
    System.out.println(obj);
}

How to Remove Elements While Traversing?

Iterator<Integer> it = list.iterator();
while (it.hasNext()) {
    // do something
    it.remove();
}

A common mistake is using a for‑each loop and calling list.remove(i), which triggers ConcurrentModificationException.

Array ↔ List Conversion

Array to List: Arrays.asList(array).

List to Array: list.toArray().

ArrayList vs LinkedList

ArrayList uses a dynamic array; LinkedList uses a doubly‑linked list.

Random access is faster in ArrayList; insert/delete in the middle is faster in LinkedList.

LinkedList consumes more memory due to node overhead.

Both are not thread‑safe.

Use ArrayList for frequent reads, LinkedList for frequent insertions/deletions.

Why is elementData in ArrayList marked transient ?

ArrayList implements Serializable. The transient modifier prevents the internal array from being serialized directly; only the actual stored elements are written, reducing size and improving speed.

CopyOnWriteArrayList

A thread‑safe variant of ArrayList that copies the underlying array on each write, suitable for read‑heavy, write‑light scenarios.

List, Set, Map Differences

List : ordered, allows duplicates.

Set : unordered, no duplicates.

Map : key‑value pairs, keys unique.

Set Interface

HashSet Implementation

HashSet is a thin wrapper around a HashMap. Uniqueness relies on proper hashCode() and equals() implementations.

Queue Interface

What is BlockingQueue?

java.util.concurrent.BlockingQueue

blocks on retrieval when empty and blocks on insertion when full, supporting producer‑consumer patterns. Implementations include ArrayBlockingQueue, LinkedBlockingQueue, PriorityBlockingQueue, SynchronousQueue, etc.

Difference Between poll() and remove()

Both return and remove the head element. poll() returns null if the queue is empty; remove() throws NoSuchElementException.

Map Interface

The overall structure of a Map is illustrated below:

HashMap Implementation

In JDK 1.7, HashMap uses an array + linked list. Since JDK 1.8, when a bucket’s chain exceeds 8 nodes, it is transformed into a red‑black tree.

Key insertion steps:

Compute the hash of the key.

Determine the bucket index.

If the bucket is empty, create a new node.

If the bucket contains a node with the same key, replace the value.

If a collision occurs, add the node to the linked list or tree.

After insertion, check if resizing is needed.

JDK 1.7 vs JDK 1.8

Resize optimization.

Introduction of red‑black trees to limit chain length.

Improved handling of concurrent resize loops (still not thread‑safe).

Aspect

JDK 1.7

JDK 1.8

Storage structure

Array + Linked List

Array + Linked List + Red‑Black Tree

Initialization

Separate inflateTable() Integrated into resize() Hash calculation

9 perturbations

2 perturbations

Insertion method

Head insertion

Tail insertion

How HashMap Avoids Collisions

HashMap applies a perturbation function: h ^ (h >>> 16), mixing high and low bits to produce a more uniform distribution.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

HashMap put Process

The put method computes the key’s hash, finds the bucket, handles existing keys, converts long chains to trees, and triggers resize when the load factor is exceeded.

Resize Operation

Resize doubles the capacity; in JDK 1.8 the new position is either the original index or original index + old capacity, avoiding full rehash.

Can Any Class Be Used as a Key?

Yes, but the class should override equals() and hashCode(), preferably be immutable to ensure stable hash values.

Why String and Integer Are Good Keys

They are final and immutable, and provide reliable equals() and hashCode() implementations.

Why HashMap Does Not Use hashCode() Directly as Index

Because hashCode() can produce values outside the array bounds; the hash is further processed and masked with length‑1 (length is a power of two) to obtain a valid index.

Why HashMap Length Is a Power of Two

Using a power‑of‑two length allows the index calculation to be performed with a fast bitwise AND operation instead of modulo.

HashMap vs ConcurrentHashMap

ConcurrentHashMap partitions the bucket array into segments (JDK 1.7) or uses fine‑grained locking with CAS (JDK 1.8), providing higher concurrency.

ConcurrentHashMap does not allow null keys or values.

ConcurrentHashMap Implementation (JDK 1.7)

Data is divided into Segment objects, each protecting a portion of the hash table with its own lock.

ConcurrentHashMap Implementation (JDK 1.8)

Segment design is removed; concurrency is achieved with Node level CAS, synchronized on the first node of a bin, and volatile fields for visibility.

Common Pitfalls with ConcurrentHashMap

Methods like containsKey followed by put are not atomic; use computeIfAbsent or LongAdder for thread‑safe counting.

// Thread‑safe counting example
ConcurrentHashMap<String, LongAdder> freqs = new ConcurrentHashMap<>();
freqs.computeIfAbsent(key, k -> new LongAdder()).increment();