SlimTrie: A Space‑Efficient Trie‑Based Index for Large‑Scale Storage Systems
This article presents SlimTrie, a trie‑based indexing structure that dramatically reduces memory consumption while maintaining fast query speeds, detailing its design, compression techniques, implementation steps, memory analysis, and performance comparisons with map and B‑Tree structures for large‑scale storage systems.
Recent discussions on the SlimTrie data structure have led to a detailed technical exposition that explains its motivation, design goals, and practical benefits for cloud storage indexing.
Background and requirements : Traditional index structures such as hash maps and tree‑based indexes (B‑Tree, RBTree, SkipList, LSM Tree) suffer from memory consumption that grows with both the number of keys (n) and key length (k), making them unsuitable for indexing billions of small files in memory‑constrained environments.
Design goals of SlimTrie : Correctly locate existing keys (no guarantee for non‑existent keys). Support range queries. Memory overhead depends only on the number of keys, not on key length. Provide fast query performance. Index only static data and limit key size to 16 KB.
Generation steps : Build a standard Trie for all keys. Compress the Trie by pruning single‑branch nodes (LeafParent and Inner nodes), reducing the space complexity from O(n·k) to O(n). Store the compressed structure using three compacted arrays (inners, leaf‑parents, steps) that replace pointer‑heavy representations with 16‑bit IDs and 4‑bit words.
Memory model : By replacing pointers with 16‑bit IDs and using bitmap‑based branch representation, the total memory consumption is roughly 6 bytes per key (plus a small constant overhead), independent of key length. The compacted arrays add about 0.15 × n + 8 bytes of overhead.
Performance evaluation : Experiments in Go compare SlimTrie with Go's built‑in map and Google’s B‑Tree implementation. Results show that SlimTrie uses significantly less memory than both alternatives and achieves query speeds about 2.6× faster than B‑Tree while being comparable to sorted array lookups. Memory usage remains stable (≈6–7 bytes per key) as the number of keys or key length varies.
Use case scenario : With 32 GB of RAM, SlimTrie can index roughly 3.2 billion files, demonstrating its suitability for petabyte‑scale storage systems.
Conclusion : SlimTrie offers a highly space‑efficient, fast‑query index that overcomes the limitations of classic data structures for massive key‑value workloads, making it a compelling choice for modern cloud storage architectures.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.