Why Java’s HashMap and C# Dictionary Differ: Deep Dive into Their Source Code
This article examines the source‑code implementations of Java HashMap and C# Dictionary, explaining their data structures, initialization, insertion, resizing, treeification, and deletion mechanisms, and compares their design choices to highlight performance and memory trade‑offs.
Background
This article analyzes the source‑code implementations of Java HashMap and C# Dictionary , sharing the underlying principles and differences of hash‑based collections in the two languages.
Common Hash Collection Concepts
Typical hash collections store a bucket array where each bucket holds a linked list (or tree) of key‑value pairs. Two core aspects are a good hash function for uniform distribution and a balanced resizing strategy that trades space for operation cost.
Java HashMap (JDK 1.8)
Data Structure
JDK 1.8 uses an array plus linked list or red‑black tree (when a list reaches 8 nodes) to store entries. The first node of each list/tree is kept in the table array, and subsequent nodes are linked via next (or left/right for trees).
Key constants defined in HashMap include:
static final int DEFAULT_INITIAL_CAPACITY = 16; // default initial capacity
static final int MAXIMUM_CAPACITY = 1073741824; // maximum capacity
static final float DEFAULT_LOAD_FACTOR = 0.75F; // default load factor
static final int TREEIFY_THRESHOLD = 8; // list‑to‑tree length
static final int UNTREEIFY_THRESHOLD = 6; // tree‑to‑list length
static final int MIN_TREEIFY_CAPACITY = 64; // minimum capacity for treeify
transient HashMap.Node<K,V>[] table; // bucket array
transient int size; // number of entries
int threshold; // capacity threshold for resize
final float loadFactor; // load factor for resizeImportant terms:
Bucket array : the table array that stores the first node of each list/tree.
Treeify : conversion of a long list into a red‑black tree.
Initialization
HashMap uses lazy initialization: the bucket array is allocated only on the first put operation. Default capacity is 16 with a load factor of 0.75 . If an initial capacity is supplied, the actual size is the next power of two greater than the requested value.
The index for a key is computed as (n‑1) & hash, which is equivalent to hash % n but faster because n is a power of two.
Insertion (put)
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1)
treeifyBin(tab, hash);
break;
}
if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) {
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}Concurrent inserts can overwrite each other when two threads compute the same bucket and both try to set p.next.
Resizing
When the number of entries exceeds threshold = capacity * loadFactor (default 75% of the bucket array), resize() doubles the array size and rehashes existing entries.
Choosing an appropriate initial capacity (e.g., 2048 for an expected 1000 entries) reduces the number of costly resizes.
Treeify
When a bucket’s list length reaches 8 , treeifyBin() converts it to a red‑black tree, provided the overall capacity is at least 64 . Below that, the map simply resizes.
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null) hd = p; else { p.prev = tl; tl.next = p; }
tl = p;
} while ((e = e.next) != null);
if ((tab[index] = hd) != null)
hd.treeify(tab);
}
}Tree nodes are about twice the size of regular nodes, so they are used only when the benefit outweighs the memory cost.
Deletion
Single‑node bucket: set entry to null.
Linked‑list bucket: unlink the node.
Tree bucket: remove the node and rebalance; if the tree shrinks below 6 nodes it converts back to a list.
C# Dictionary
Data Structure
Dictionary stores all entries in a contiguous entries array. The buckets array holds the index of the first entry for each bucket. This layout improves cache locality.
private int[] buckets; // bucket heads (indices into entries)
private Entry[] entries; // actual key‑value entries
private int count; // number of used entries
private int version; // version for iterator safety
private int freeList; // index of first free entry
private int freeCount; // number of free entries
private IEqualityComparer<TKey> comparer;
private struct Entry {
public int hashCode; // -1 if unused
public int next; // index of next entry in bucket (-1 if last)
public TKey key;
public TValue value;
}Initialization
Default capacity is 3 . When a custom capacity is supplied, the actual size is the next prime number.
Insertion
private int FindEntry(TKey key) {
if (key == null) ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
return i;
}
}
return -1;
}Bucket index is computed as hashCode % buckets.Length, where buckets.Length is a prime to reduce collisions.
Resizing
private void Resize(int newSize, bool forceNewHashCodes) {
int[] newBuckets = new int[newSize];
for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
Entry[] newEntries = new Entry[newSize];
Array.Copy(entries, 0, newEntries, 0, count);
if (forceNewHashCodes) {
for (int i = 0; i < count; i++) {
if (newEntries[i].hashCode != -1)
newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
}
}
for (int i = 0; i < count; i++) {
if (newEntries[i].hashCode >= 0) {
int bucket = newEntries[i].hashCode % newSize;
newEntries[i].next = newBuckets[bucket];
newBuckets[bucket] = i;
}
}
buckets = newBuckets;
entries = newEntries;
}Resize occurs when the entries array is full or when a bucket’s chain length exceeds 100 . In the latter case, the resize may be performed without increasing the array size but with a re‑hash.
Deletion
Removed entries are not physically erased; their index is added to freeList . Subsequent inserts reuse these free slots, keeping the array compact.
Comparison
Aspect
HashMap
Dictionary
Analysis
Data Structure
Bucket array
table
holds head nodes; entries stored in linked lists or red‑black trees.
All entries stored in contiguous
entries
array;
buckets
hold indices of list heads.
Dictionary’s layout improves cache locality; HashMap gains tree‑based performance for long chains.
Hash Function
Uses (n‑1) & hash (requires power‑of‑two bucket size).
Uses hashCode % buckets.Length (requires prime bucket size).
Both aim to reduce collisions; prime size helps when key hash codes are poorly distributed.
Resize Mechanism
Triggered when load factor (0.75) is exceeded; capacity doubles.
Triggered when
entries
array fills or a chain exceeds 100; may also resize on high collision count.
Dictionary saves space but can suffer long chains; HashMap’s treeify mitigates that at the cost of extra memory.
Deletion
Removes node and frees space immediately.
Marks node as free and records its index in
freeList
for reuse.
Dictionary’s free‑list reuse reduces allocation overhead; HashMap’s immediate free can cause fragmentation under heavy churn.
Conclusion
Reading and dissecting the source code of hash‑based collections deepens understanding of data‑structure trade‑offs, performance characteristics, and implementation tricks, ultimately improving programming skills.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tuhu Marketing Technology Team
Official tech channel of the Tuhu Marketing Technology Team, offering a developer community. We share the challenges, design concepts, and solutions from building our systems, aimed at internet developers. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
