Mastering Java HashMap: From Basics to Interview Mastery
This article explains the fundamentals of Java's HashMap, covering its relationship to Set and Map families, internal implementation details, hashCode/equals contracts, collision resolution strategies, basic operations with code examples, and common interview questions such as Top‑K and LRU cache design.
Preface
HashMap is a data structure that appears frequently in both work and interview settings.
For example, the optimal solution to a variant of LeetCode's first problem "Two Sum" requires a HashMap, and the classic LRU Cache question relies on LinkedHashMap.
This article dives deep into HashMap, guiding you through its origins and helping you ace interview questions.
What is the difference between == and equals()? Why must hashCode() be overridden when equals() is overridden? Differences and relationships among Hashtable, HashSet, and HashMap How are hash collisions handled in Java? Which method is used and why? Implement a HashMap from scratch
The article is organized into the following sections:
Introduction to the Set and Map families
HashMap implementation principles
Discussion of hashCode() and equals()
Detailed analysis of hash collisions
Basic HashMap operations
Analysis of common interview questions
Set Family
Before discussing Map, let's look at Set.
A Set is a collection that does not allow duplicate elements, similar to the mathematical concept.
In Java, Set is an interface in the
java.utilpackage, with several implementations:
Common implementations include:
HashSet : Stores elements as keys in a HashMap; unordered with O(1) basic operations.
LinkedHashSet : Combines a HashSet with a linked list to preserve insertion order while retaining O(1) performance.
TreeSet : Uses a red‑black tree to maintain order; slower lookups compared to HashSet.
Map Family
A Map stores key‑value pairs, where keys must be unique.
Key implementations are:
HashMap : Unordered, O(1) average performance.
LinkedHashMap : A HashMap plus a doubly linked list, preserving insertion order.
TreeMap : Ordered implementation based on a binary search tree.
HashMap Implementation Principles
Each key is processed by a hash function to produce a hash value, which determines the bucket index in an internal array (buckets). The index is calculated as
hash % array.lengthand the entry is stored at that position.
If different keys produce the same hash value, how are they stored?
Answer: This is a hash collision , where multiple keys map to the same bucket.
How does HashMap ensure element uniqueness? Could identical elements produce different hash values?
Answer: Uniqueness is guaranteed by the
hashCode()and
equals()methods.
If the number of pairs grows while the bucket count stays low, what happens?
Answer: Rehashing occurs—when collisions become excessive, the bucket array is resized (typically doubled), changing the index calculation and distributing entries more evenly.
When does rehashing trigger? How is bucket crowding measured?
Answer: By the load factor , defined as
number_of_entries / number_of_buckets. Java's default load factor is
0.75f; exceeding this triggers rehashing.
About hashCode() and equals()
If two keys have the same
hashCode(), a collision may occur;
equals()is then used to determine actual equality.
hashCode() determines the bucket index; equals() compares two objects for equality.
Classic interview question: "Why must you override
hashCode()when you override
equals()?"
Because if two equal objects have different hash codes, they would be placed in different buckets, breaking the contract between the two methods.
Both methods are originally defined in
java.lang.Object. The default
equals()compares object references (similar to
==), but can be overridden to compare logical content, as shown with
Stringexamples.
<code>str1 = "tianxiaoqi";
str2 = new String("tianxiaoqi");
str1 == str2; // false
str1.equals(str2); // true</code>Source code snippets from the JDK illustrate that
equals()ultimately relies on reference comparison unless overridden.
Hash Collision Details
Two main collision‑resolution strategies exist:
Separate chaining Open addressing
Java uses separate chaining . In JDK 1.6/1.7, each bucket stores a linked list; in JDK 1.8, if a chain exceeds eight nodes, it is transformed into a red‑black tree for faster lookups.
Open addressing (e.g., linear probing) searches sequentially for the next free bucket when a collision occurs.
HashMap Basic Operations
All data structures support the four basic operations—Create, Read, Update, Delete (CRUD). For HashMap:
Add: put(K key, V value) Delete: remove(Object key) Update: also put(K key, V value) Read: get(Object key) / containsKey(Object key)
The
putmethod works as follows:
Compute the bucket index using the hash of the key.
Traverse the linked list (or tree) at that bucket.
If the key exists, update its value; otherwise, insert a new node at the head of the list.
<code>public V put(K key, V value) {
int index = getIndex(key);
Node<K, V> node = array[index];
Node<K, V> head = node;
while (node != null) {
// key already exists, update value
if (checkEquals(key, node)) {
V prev = node.value;
node.value = value;
return prev;
}
node = node.next;
}
// key not found, insert new node at head
Node<K, V> newNode = new Node(key, value);
newNode.next = head;
array[index] = newNode;
return null;
}</code>Further details such as rehashing and load factor can be examined in the source code. Practicing with LeetCode problem 706 ("Design HashMap") is recommended.
Difference from Hashtable
Hashtable is an older, thread‑safe counterpart to HashMap. HashMap is not synchronized, offering better performance in single‑threaded contexts. Additionally, HashMap permits
nullkeys, whereas Hashtable does not.
Top K Problem
A frequently asked interview question is to find the top k most frequent words in a list.
Given a list of words, return the k words with the highest frequency.
Solution outline:
Use a
HashMap<String, Integer>to count word frequencies (O(n)).
Maintain a min‑heap of size k to keep the current top‑k entries, ordering first by frequency then lexicographically.
Iterate over the map, inserting into the heap and evicting lower‑frequency entries as needed (overall O(n log k)).
<code>class Solution {
public List<String> topKFrequent(String[] words, int k) {
Map<String, Integer> map = new HashMap<>();
for (String word : words) {
map.put(word, map.getOrDefault(word, 0) + 1);
}
PriorityQueue<Map.Entry<String, Integer>> minHeap = new PriorityQueue<>(k + 1,
(e1, e2) -> {
if (e1.getValue().equals(e2.getValue())) {
return e2.getKey().compareTo(e1.getKey());
}
return e1.getValue().compareTo(e2.getValue());
});
for (Map.Entry<String, Integer> entry : map.entrySet()) {
minHeap.offer(entry);
if (minHeap.size() > k) {
minHeap.poll();
}
}
List<String> res = new ArrayList<>();
while (!minHeap.isEmpty()) {
res.add(minHeap.poll().getKey());
}
Collections.reverse(res);
return res;
}
}</code>LRU Cache
This classic interview problem also relies heavily on HashMap combined with a doubly linked list to achieve O(1) get and put operations.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.