Fundamentals 12 min read

How to Build Ultra‑Fast Static Query Maps in Rust Without Runtime Overhead

This article explains why embedding large static JSON data is inefficient, introduces Minimal Perfect Hash Functions (MPHF) and various static map constructions, shows how to pack strings and avoid relocation overhead, and demonstrates the resulting improvements in binary size, lookup speed, memory usage, and compile time.

ByteDance Web Infra
ByteDance Web Infra
ByteDance Web Infra
How to Build Ultra‑Fast Static Query Maps in Rust Without Runtime Overhead

When you need to embed a large amount of static, queryable data in an application, the naive approach is to embed a JSON file and parse it at runtime.

static MAP: LazyLock<HashMap<&str, usize>> = LazyLock::new(|| serde_json::from_str(include_str!("data.json")).unwrap());

Requires a blocking call on first access.

Allocates never‑released resident memory.

Needs a complex JSON parser.

We present techniques to construct a completely static, high‑performance map that needs no initialization, no parsing, and no memory allocation while keeping compile‑time speed and binary size minimal.

Efficient Static Queryable Map

MPHF (Minimal Perfect Hash Function) is a common technique for building efficient static maps.

Typical hash maps have O(1) average lookup but can degrade to O(N) when collisions occur, and they require extra space for buckets.

MPHF guarantees O(1) worst‑case lookup, no empty slots, and a compact layout.

hashmap illustration
hashmap illustration
phf illustration
phf illustration

Simplest MPHF

An MPHF is a bijective hash function that maps a given key set one‑to‑one onto a value domain of the same size.

The simplest construction is: f(x) = h(seed, x) mod n Rotate the seed used by the hash function.

Repeat until a seed yields a collision‑free mapping.

For large key sets a single seed that avoids collisions is unrealistic.

More Practical MPHF Constructions

For many keys, the approach resembles a traditional hashmap: split keys into buckets, resolve collisions per bucket, and store a displacement value for each bucket.

When a bucket collides, adjust its displacement.

Repeat until all buckets are conflict‑free.

phf‑displace illustration
phf‑displace illustration

FCH : f(x) = (h(x) + d[i]) mod n. Adjusting numeric offsets gives a small adjustment range, making it hard to find suitable displacement values.

CDH : f(x) = φσ[g(x)](x "g(x") ) mod n. Uses independent hash values for displacement, giving good results but requires an extra hash computation.

PtHash : f(x) = (h(x) ⊕ h'(k[i])) mod n. Uses a numeric key for XOR, combining the benefits of the previous two without heavy hashing.

Advanced MPHF: PtrHash

Ptr‑hash improves PtHash by using a cuckoo‑hash strategy for bucket adjustment.

When a key collides, the new key evicts the old one.

The evicted key is rehashed with a different function.

If the rehashed key collides again, the process repeats.

This behavior resembles a cuckoo’s “nest‑stealing” strategy.

cuckoo hash illustration
cuckoo hash illustration
ELF’s new RELR relocation scheme can reduce the per‑string overhead to just 2 bits.
Mac and Windows relocations are slightly better than ELF’s traditional RELA scheme but still worse than RELR.

String Packing and Manual Index

Because the pointer‑based approach adds 16 bytes per string, we can pack all strings into a single blob and store end positions as a 4‑byte index.

const STRINGS: &str = include_str!("string.bin");
const INDEX: &[u32] = &[4, 10, 15 /* ... */];
fn get(index: usize) -> Option<&'static str> {
    let end = INDEX.get(index)? as usize;
    let start = index.checked_sub(1).map(|i| INDEX[i]).unwrap_or_default() as usize;
    Some(&STRINGS[start..end])
}

This reduces each string’s index overhead to 4 bytes and eliminates relocation costs.

String Pool

We can also build a custom string pool that deduplicates strings and stores offset + length in a single 32‑bit value, limiting each string to 255 bytes but keeping the index at 4 bytes.

#[derive(Default)]
struct StrPool<'s> {
    pool: String,
    map: HashMap<&'s str, u32>,
}
impl<'s> StrPool<'s> {
    pub fn insert(&mut self, s: &'s str) -> u32 {
        *self.map.entry(s).or_insert_with(|| {
            let offset = self.pool.len();
            self.pool.push_str(s);
            let len: u8 = (self.pool.len() - offset).try_into().unwrap();
            let offset: u32 = offset.try_into().unwrap();
            if offset > (1 << 24) { panic!("string too large"); }
            offset | ((u32::from(len)) << 24)
        })
    }
}

Compilation Speed

String packing also dramatically improves compile time and memory usage. The three heavy rustc phases for the naive &[&str] approach shrink to negligible amounts after packing.

Before:

time:   1.414; rss:  258MB ->  761MB ( +503MB)        type_check_crate
time:   2.325; rss:  712MB ->  651MB (  -61MB)        LLVM_passes
time:   2.217; rss:  476MB ->  597MB ( +121MB)        finish_ongoing_codegen

After:

time:   0.022; rss:  107MB ->  142MB (  +35MB)        type_check_crate
time:   0.100; rss:  194MB ->  183MB (  -11MB)        LLVM_passes
time:   0.098; rss:  159MB ->  177MB (  +18MB)        finish_ongoing_codegen

The reduction comes from avoiding the generation of many objects and unnecessary optimizations.

Summary

Embedding a static HashMap looks simple, but the naive solution incurs runtime parsing, memory allocation, and binary bloat. Using MPHF techniques together with string‑packing eliminates these costs while preserving O(1) lookup, a compact memory layout, and fast compilation.

Ruststatic mapcompile-time optimizationminimal perfect hashstring packing
ByteDance Web Infra
Written by

ByteDance Web Infra

ByteDance Web Infra team, focused on delivering excellent technical solutions, building an open tech ecosystem, and advancing front-end technology within the company and the industry | The best way to predict the future is to create it

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.