Databases 8 min read

Understanding Hash Sharding and Hash Tables in Distributed Databases (DBLE)

This article explains the principles of hash tables, how hash functions are used for data sharding in distributed databases such as DBLE, compares hash‑modulo and consistent‑hash approaches, and discusses their advantages, limitations, and practical configuration tips.

Aikesheng Open Source Community

Nov 2, 2020

Understanding Hash Sharding and Hash Tables in Distributed Databases (DBLE)

Background The author, a DBLE project test lead, often encounters bugs while testing distributed middleware and enjoys iterating on solutions.

When first encountering the distributed database middleware DBLE, the concept of hash sharding can be confusing because hash is usually associated with hash tables, which map distinct inputs to distinct outputs, whereas sharding requires many values to map to the same node.

Concept – Hash Table A hash table is a data structure that uses a hash function to map an input to a numeric index, typically stored as an array of linked lists. A well‑designed hash function combines the fast lookup of arrays with the flexible insertion/deletion of linked lists, achieving O(1) performance for these operations.

Key characteristics of hash tables:

Map input to a number.

Different inputs produce different outputs.

Identical inputs produce identical outputs.

When the load factor exceeds a threshold, the table automatically expands.

Values are uniformly distributed.

Hash Sharding In distributed databases, hash sharding follows similar principles: fixed inputs map to fixed nodes, data should be evenly distributed, and expansion should be easy.

Design points for hash sharding:

Fixed data maps to fixed slots.

Even data distribution.

Convenient scaling.

Dynamic scaling aims to move as little data as possible. Consistent hashing reduces data movement but can lead to uneven distribution, which motivated the development of jump‑consistent hash algorithms.

Similarities between hash tables and hash sharding:

Deterministic mapping (same input → same output).

Uniform value distribution.

Ease of scaling by re‑hashing when expanding.

Both rely on a well‑designed hash function to achieve good performance.

Hash Modulo Sharding The simplest hash function is modulo hashing. For example, with two shards, data is assigned based on key % 2. DBLE also supports variants that allow specifying continuous value ranges for each shard to improve range‑query efficiency.

Advantages of modulo hashing:

Simple implementation.

Good balance and hotspot dispersion.

Easy to identify the shard of a given record.

Disadvantages:

Poor range‑query performance.

Scaling is costly because changing the number of shards requires re‑hashing most of the data.

DBLE recommends keeping the modulo base ≤ 2880 to limit the amount of data movement during scaling, using numbers whose divisors are listed in the article.

References

《图解算法》

《分布式系统常用技术及案例分析》

DBLE sharding configuration documentation

Ant Financial technical docs

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding Hash hash table distributed databases DBLE modulo

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.