Fundamentals 26 min read

Unveiling Swift’s Dictionary: Inside the Hash Table and Memory Layout

This article dives deep into Swift's Dictionary implementation, explaining its underlying hash‑table structure, memory layout, variant storage, probing strategies, copy‑on‑write semantics, conflict resolution, load‑factor handling, and automatic resizing, all illustrated with real code and debugger screenshots.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Unveiling Swift’s Dictionary: Inside the Hash Table and Memory Layout

Introduction

After covering Swift's Array, we explore the inner workings of Dictionary, whose design shares similarities with Array but centers on hash‑based storage.

What is Dictionary

In Swift, Dictionary is a generic object that stores key‑value pairs, where the key must conform to Hashable.

Memory Inspection

Running a simple test program and setting a breakpoint on print("end") reveals that a Dictionary instance lives on the heap (e.g., address 0x00000001005469b0) and shares the same HeapObject layout as an Array. The debugger shows the stored element count but not the actual keys or values.

Dictionary Structure

The source defines struct Dictionary<Key: Hashable, Value> with a single stored property _variant of type _Variant. The _Variant struct contains a _BridgeStorage<__RawDictionaryStorage> object, mirroring the storage pattern used by Array.

4.1 Dictionary Definition

@frozen
public struct Dictionary<Key: Hashable, Value> {
    public typealias Element = (key: Key, value: Value)
    internal var _variant: _Variant
    internal init(_native: __owned _NativeDictionary<Key, Value>) {
        _variant = _Variant(native: _native)
    }
    #if _runtime(_ObjC)
    internal init(_cocoa: __owned __CocoaDictionary) {
        _variant = _Variant(cocoa: _cocoa)
    }
    #endif
}

The _Variant struct holds a single property object of type _BridgeStorage<__RawDictionaryStorage>, which in turn points to the raw storage used by the dictionary.

extension Dictionary {
    @frozen internal struct _Variant {
        internal var object: _BridgeStorage<__RawDictionaryStorage>
    }
}

4.2 Native Dictionary

The native storage is represented by _NativeDictionary, which contains a _storage field of type __RawDictionaryStorage. This raw storage holds the hash table metadata, keys, values, and auxiliary fields such as count, capacity, scale, and seed.

@frozen internal struct _NativeDictionary<Key: Hashable, Value> {
    internal var _storage: __RawDictionaryStorage
    internal init() { self._storage = __RawDictionaryStorage.empty }
    internal init(_ storage: __owned __RawDictionaryStorage) { self._storage = storage }
    internal init(capacity: Int) { /* allocate appropriate storage */ }
}

4.3 Raw Dictionary Storage

The class __RawDictionaryStorage tail‑allocates memory for a bitmap, keys, and values. It tracks the number of occupied entries ( _count), capacity ( _capacity), scale ( _scale), and a hash seed ( _seed).

internal class __RawDictionaryStorage: __SwiftNativeNSDictionary {
    internal var _count: Int
    internal var _capacity: Int
    internal var _scale: Int8
    internal var _reservedScale: Int8
    internal var _extra: Int16
    internal var _age: Int32
    internal var _seed: Int
    internal var _rawKeys: UnsafeMutableRawPointer
    internal var _rawValues: UnsafeMutableRawPointer
    internal var _hashTable: _HashTable {
        // computed property returning a HashTable view
    }
}

4.4 Hash Table

The hash table is a lightweight struct that stores an array of Word (a UInt) representing bucket occupancy and a mask used for fast modulo operations.

internal struct _HashTable {
    internal typealias Word = _UnsafeBitset.Word
    internal var words: UnsafeMutablePointer<Word>
    internal let bucketMask: Int
    internal init(words: UnsafeMutablePointer<Word>, bucketCount: Int) {
        self.words = words
        self.bucketMask = bucketCount - 1
    }
    internal func bucket(wrappedAfter bucket: Bucket) -> Bucket {
        return Bucket(offset: (bucket.offset + 1) & bucketMask)
    }
    internal func idealBucket(forHashValue hash: Int) -> Bucket {
        return Bucket(offset: hash & bucketMask)
    }
    internal func _isOccupied(_ bucket: Bucket) -> Bool { /* check bitmap */ }
}

4.5 Probing and Collision Resolution

Swift uses open addressing with quadratic probing (referred to as “linear probing” in the source comments) to resolve collisions. The find method walks buckets until it finds an empty slot or the matching key.

internal final func find<Key: Hashable>(_ key: Key, hashValue: Int) -> (bucket: _HashTable.Bucket, found: Bool) {
    var bucket = hashTable.idealBucket(forHashValue: hashValue)
    while hashTable._isOccupied(bucket) {
        if uncheckedKey(at: bucket) == key { return (bucket, true) }
        bucket = hashTable.bucket(wrappedAfter: bucket)
    }
    return (bucket, false)
}

4.6 Copy‑On‑Write

Although Dictionary is a struct, it employs reference semantics internally. Before a mutating operation, isUniquelyReferenced() checks whether the storage is shared; if not, a new copy of the storage is created.

internal mutating func setValue(_ value: __owned Value, forKey key: Key) {
    let isUnique = self.isUniquelyReferenced()
    asNative.setValue(value, forKey: key, isUnique: isUnique)
}

4.7 Load Factor and Resizing

The dictionary monitors its load factor (ratio of occupied slots to total slots). When the factor exceeds roughly 0.75, it allocates a larger storage (typically double the size), recomputes the hash for each key, and re‑inserts them (rehash).

static internal func allocate(scale: Int8, age: Int32?, seed: Int?) -> _DictionaryStorage {
    let bucketCount = 1 << scale
    // allocate tail‑elements for bitmap, keys, values
    // initialize metadata, seed, etc.
    return storage
}

4.8 Hash Function Basics

Swift’s built‑in hash function produces a 64‑bit value. The dictionary then maps it to a bucket using hashValue & bucketMask, which is equivalent to a modulo operation when the bucket count is a power of two. This fast bitwise masking replaces expensive division.

Conflict‑Resolution Strategies

Swift’s Dictionary uses open addressing (specifically quadratic probing) rather than separate chaining. This keeps all entries in a contiguous array, improving cache locality and maintaining average O(1) lookup and insertion.

Performance Optimizations

The implementation chooses bucket counts that are powers of two, enabling the cheap mask operation. It also pre‑computes a scale value based on the desired capacity and maximum load factor, ensuring the hash table grows efficiently.

Hash Functions in Practice

Beyond the built‑in hash, developers can use cryptographic hashes such as MD5, SHA‑1, or SHA‑256. When the hash output is larger than the table size, a simple modulo (or mask) reduces it to the appropriate range without losing the uniform distribution properties.

Conclusion

The article has walked through the essential components of Swift’s Dictionary: the public struct, its variant bridge, native storage, raw storage layout, hash table mechanics, probing, copy‑on‑write, load‑factor‑driven resizing, and the underlying hash function.

Dictionary memory layout
Dictionary memory layout
PerformanceSwiftData Structureshash tableCopy-on-Writedictionary
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.