Fundamentals 27 min read

Why Java’s HashMap and C# Dictionary Differ: Deep Dive into Their Source Code

This article examines the source‑code implementations of Java HashMap and C# Dictionary, explaining their data structures, initialization, insertion, resizing, treeification, and deletion mechanisms, and compares their design choices to highlight performance and memory trade‑offs.

Tuhu Marketing Technology Team

Jan 8, 2021

Why Java’s HashMap and C# Dictionary Differ: Deep Dive into Their Source Code

Background

This article analyzes the source‑code implementations of Java HashMap and C# Dictionary , sharing the underlying principles and differences of hash‑based collections in the two languages.

Common Hash Collection Concepts

Typical hash collections store a bucket array where each bucket holds a linked list (or tree) of key‑value pairs. Two core aspects are a good hash function for uniform distribution and a balanced resizing strategy that trades space for operation cost.

Java HashMap (JDK 1.8)

Data Structure

JDK 1.8 uses an array plus linked list or red‑black tree (when a list reaches 8 nodes) to store entries. The first node of each list/tree is kept in the table array, and subsequent nodes are linked via next (or left/right for trees).

Key constants defined in HashMap include:

static final int DEFAULT_INITIAL_CAPACITY = 16;  // default initial capacity
static final int MAXIMUM_CAPACITY = 1073741824;  // maximum capacity
static final float DEFAULT_LOAD_FACTOR = 0.75F;  // default load factor
static final int TREEIFY_THRESHOLD = 8;  // list‑to‑tree length
static final int UNTREEIFY_THRESHOLD = 6;  // tree‑to‑list length
static final int MIN_TREEIFY_CAPACITY = 64;  // minimum capacity for treeify
transient HashMap.Node<K,V>[] table;  // bucket array
transient int size;  // number of entries
int threshold;  // capacity threshold for resize
final float loadFactor;  // load factor for resize

Important terms:

Bucket array : the table array that stores the first node of each list/tree.

Treeify : conversion of a long list into a red‑black tree.

Initialization

HashMap uses lazy initialization: the bucket array is allocated only on the first put operation. Default capacity is 16 with a load factor of 0.75 . If an initial capacity is supplied, the actual size is the next power of two greater than the requested value.

The index for a key is computed as (n‑1) & hash, which is equivalent to hash % n but faster because n is a power of two.

Insertion (put)

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Concurrent inserts can overwrite each other when two threads compute the same bucket and both try to set p.next.

Resizing

When the number of entries exceeds threshold = capacity * loadFactor (default 75% of the bucket array), resize() doubles the array size and rehashes existing entries.

Choosing an appropriate initial capacity (e.g., 2048 for an expected 1000 entries) reduces the number of costly resizes.

Treeify

When a bucket’s list length reaches 8 , treeifyBin() converts it to a red‑black tree, provided the overall capacity is at least 64 . Below that, the map simply resizes.

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null) hd = p; else { p.prev = tl; tl.next = p; }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

Tree nodes are about twice the size of regular nodes, so they are used only when the benefit outweighs the memory cost.

Deletion

Single‑node bucket: set entry to null.

Linked‑list bucket: unlink the node.

Tree bucket: remove the node and rebalance; if the tree shrinks below 6 nodes it converts back to a list.

C# Dictionary

Data Structure

Dictionary stores all entries in a contiguous entries array. The buckets array holds the index of the first entry for each bucket. This layout improves cache locality.

private int[] buckets;      // bucket heads (indices into entries)
private Entry[] entries;    // actual key‑value entries
private int count;          // number of used entries
private int version;        // version for iterator safety
private int freeList;       // index of first free entry
private int freeCount;      // number of free entries
private IEqualityComparer<TKey> comparer;
private struct Entry {
    public int hashCode;   // -1 if unused
    public int next;       // index of next entry in bucket (-1 if last)
    public TKey key;
    public TValue value;
}

Initialization

Default capacity is 3 . When a custom capacity is supplied, the actual size is the next prime number.

Insertion

private int FindEntry(TKey key) {
    if (key == null) ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    if (buckets != null) {
        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
        for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
                return i;
        }
    }
    return -1;
}

Bucket index is computed as hashCode % buckets.Length, where buckets.Length is a prime to reduce collisions.

Resizing

private void Resize(int newSize, bool forceNewHashCodes) {
    int[] newBuckets = new int[newSize];
    for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
    Entry[] newEntries = new Entry[newSize];
    Array.Copy(entries, 0, newEntries, 0, count);
    if (forceNewHashCodes) {
        for (int i = 0; i < count; i++) {
            if (newEntries[i].hashCode != -1)
                newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
        }
    }
    for (int i = 0; i < count; i++) {
        if (newEntries[i].hashCode >= 0) {
            int bucket = newEntries[i].hashCode % newSize;
            newEntries[i].next = newBuckets[bucket];
            newBuckets[bucket] = i;
        }
    }
    buckets = newBuckets;
    entries = newEntries;
}

Resize occurs when the entries array is full or when a bucket’s chain length exceeds 100 . In the latter case, the resize may be performed without increasing the array size but with a re‑hash.

Deletion

Removed entries are not physically erased; their index is added to freeList . Subsequent inserts reuse these free slots, keeping the array compact.

Comparison

Aspect

HashMap

Dictionary

Analysis

Data Structure

Bucket array

table

holds head nodes; entries stored in linked lists or red‑black trees.

All entries stored in contiguous

entries

array;

buckets

hold indices of list heads.

Dictionary’s layout improves cache locality; HashMap gains tree‑based performance for long chains.

Hash Function

Uses (n‑1) & hash (requires power‑of‑two bucket size).

Uses hashCode % buckets.Length (requires prime bucket size).

Both aim to reduce collisions; prime size helps when key hash codes are poorly distributed.

Resize Mechanism

Triggered when load factor (0.75) is exceeded; capacity doubles.

Triggered when

entries

array fills or a chain exceeds 100; may also resize on high collision count.

Dictionary saves space but can suffer long chains; HashMap’s treeify mitigates that at the cost of extra memory.

Deletion

Removes node and frees space immediately.

Marks node as free and records its index in

freeList

for reuse.

Dictionary’s free‑list reuse reduces allocation overhead; HashMap’s immediate free can cause fragmentation under heavy churn.

Conclusion

Reading and dissecting the source code of hash‑based collections deepens understanding of data‑structure trade‑offs, performance characteristics, and implementation tricks, ultimately improving programming skills.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance DataStructure HashMap hashtable dictionary C#

Written by

Tuhu Marketing Technology Team

Official tech channel of the Tuhu Marketing Technology Team, offering a developer community. We share the challenges, design concepts, and solutions from building our systems, aimed at internet developers. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.