Fundamentals 21 min read

Perfect Hash Functions and Their Use in High‑Performance HashMaps

The article explains perfect hash functions, their collision‑free construction methods such as FCH, CHD, and PTHash, compares them to conventional hash tables, reviews common and cryptographic hash functions, and shows how read‑only perfect‑hash maps deliver faster lookups and lower memory use for static key sets.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Perfect Hash Functions and Their Use in High‑Performance HashMaps

This article reviews the concept of perfect hash functions, explains how they resolve hash collisions, and compares ordinary hash tables with perfect hash tables. It then introduces common hash functions and perfect hash functions, followed by a discussion of their applications in various scenarios.

Hash Function Basics

A hash function maps arbitrary data to a small numeric fingerprint (hash value). It is a many‑to‑one, one‑way mapping that aims to be consistent, exhibit the avalanche effect, avoid collisions, and be irreversible. Typical properties include consistency, avalanche effect, one‑wayness, and low collision probability.

Common Hash Functions

CRC32 – fast 32‑bit hash used for data integrity checks.

SipHash – provides HashDoS protection, used as the default hash in Rust and Redis.

MurmurHash3 – classic fast hash producing 32‑ or 128‑bit values.

CityHash – Google’s hash, faster than MurmurHash, outputs 64/128/256‑bit values.

xxHash – extremely fast for small datasets, supports 32/64/128‑bit outputs.

These functions are evaluated in a benchmark that measures speed and collision resistance.

Secure Hash Functions

Cryptographic hash functions (e.g., SHA‑2, SHA‑3, BLAKE) provide collision resistance and pre‑image resistance, making them suitable for digital signatures, proof‑of‑work, and file integrity verification.

Perfect Hash Functions (PHF)

A perfect hash function maps a static key set S to a range [0, m) without collisions. When m = |S| it is a minimal perfect hash function (MPHF). PHFs require offline construction and are useful for static mappings such as dictionaries or pinyin tables.

Universal Hashing

Universal hashing selects a hash function randomly from a family to guarantee a low expected collision probability (≈1/m) even for adversarial inputs, mitigating HashDoS attacks.

Algorithmic Implementations

The article details three modern PHF construction algorithms:

1. FCH (Fast Construction of Minimal Perfect Hash)

FCH builds a two‑level hash: the first level distributes keys into buckets, and the second level resolves collisions within each bucket using a secondary hash. Expected space is O(n) with careful hash selection.

buckets.resize(m);
for (auto key : keys) {
  auto [h, h0, h1] = hash(key);
  buckets[h].hash = h;
  buckets[h].keys.push_back(make_tuple(h0, h1));
}

2. CHD (Compressed Hash‑Displacement)

CHD also uses a three‑phase process (Mapping, Ordering, Searching) but stores a compact displacement value for each bucket, achieving higher space efficiency at the cost of slightly slower lookups.

sort(buckets.begin(), buckets.end(), [](auto &lhs, auto &rhs){
  return lhs.keys.size() > rhs.keys.size();
});

3. PTHash

PTHash improves on FCH by using a XOR‑based position function and compact encodings (front‑back encoding) to reduce memory while keeping lookup speed comparable to FCH.

All three algorithms share a similar workflow: map keys to buckets, order buckets by conflict count, then search for displacement parameters that eliminate collisions.

bool_vector position_used, position_used_in_bucket;
vector
p; // result array

position_used.resize(table_size);
position_used_in_bucket.resize(buckets[0].keys.size());
p.resize(m);

for (auto &bucket : buckets) {
  if (bucket.keys.size() == 0) continue;
  int d0 = 0;
  int d1 = 0;
  while (true) {
    bool ok = true;
    position_used_in_bucket.clear();
    for (auto [h0, h1] : bucket.keys) {
      uint64 position = (h0 + (h1 * d1) + d0) % table_size;
      if (position_used[position] || position_used_in_bucket[position]) {
        ok = false;
        break;
      }
      position_used_in_bucket[position] = true;
    }
    if (ok) {
      position_used.union(position_used_in_bucket);
      p[bucket.h] = d0 * table_size + d1;
      break;
    }
    d1++;
    if (d1 >= table_size) { d1 = 0; d0++; if (d0 > table_size) throw ...; }
  }
}

Lookup using the constructed PHF is straightforward:

auto [h, h0, h1] = hash(key);
auto pilot = p[h];
auto d0 = pilot / table_size;
auto d1 = pilot % table_size;
return (h0 + (h1 * d1) + d0) % table_size;

HashMap Implementations

The article surveys several hash map designs:

Traditional hash maps (chaining, linear probing, tree‑based buckets).

F14/B16 SIMD‑accelerated hash maps (high load factor, dynamic insertion).

PerfectHashMap – read‑only hash map built on PHFs, offering O(1) lookups with minimal memory.

Benchmarks on a MacBook Pro M1 Pro compare unordered_map, Folly F14, Abseil Swiss Table, and various PTHashMap configurations for both string and uint64 keys. Results show that perfect‑hash‑based maps achieve superior speed and memory usage for static workloads.

Conclusion

Perfect hash extends the applicability of hash functions to read‑only, large‑scale scenarios. Recent algorithms such as PTHash provide fast construction and high space efficiency, making perfect‑hash‑based hash maps an attractive choice for performance‑critical, static key‑value stores.

References

What is a hash algorithm?

Secure hash functions.

MurmurHash Wikipedia.

Hash tables vs. perfect hash tables.

Universal hashing.

Perfect hash research.

Inverted index compression in 58 Search.

algorithmperfect hashData StructureBenchmarkhash functionHash Mapuniversal hashing
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.