Databases 33 min read

How NoSQL Databases Achieve Scalability: Distributed Strategies Explained

This article systematically explores the distributed characteristics of NoSQL databases, covering data consistency, placement, peer systems, anti‑entropy protocols, eventual consistency data types, sharding, fault detection, and coordinator election, illustrating how these strategies balance scalability, availability, latency, and fault tolerance.

21CTO

Oct 12, 2015

How NoSQL Databases Achieve Scalability: Distributed Strategies Explained

System scalability is the main driver of the NoSQL movement, encompassing distributed coordination, failover, resource management, and more. Although NoSQL has not fundamentally changed distributed data processing, it has spurred extensive research on protocols and algorithms, leading to effective database construction methods.

We examine distributed strategies such as replication, data placement, and peer systems, highlighted in bold and divided into three parts.

Data Consistency : NoSQL must balance consistency, fault tolerance, performance, low latency, and high availability, with consistency often being mandatory. This section discusses data replication and recovery.

Data Placement : A database product must handle various data distributions, cluster topologies, and hardware configurations. This section covers how to distribute and adjust data placement to ensure fault tolerance, persistence, efficient queries, and balanced resource usage.

Peer Systems : Techniques such as leader election are used to achieve fault tolerance and strong consistency. Even decentralized databases need to track global state, detect failures, and handle topology changes.

Data Consistency

Distributed systems often encounter network partitions or latency, making it impossible to maintain high availability without sacrificing consistency (CAP theorem). Consistency is expensive, so trade‑offs are made. We focus on replication characteristics.

Availability: the remaining partition can still serve reads/writes.

Read/Write latency: operations complete quickly.

Read/Write scalability: load is balanced across nodes.

Fault tolerance: processing does not depend on a single node.

Persistence: node failures do not cause data loss.

Consistency: more complex; we discuss several viewpoints without deep theoretical models.

Anti‑entropy Protocols and Gossip Algorithms

We examine anti‑entropy protocols where nodes periodically exchange digests to reconcile differences. Three styles exist: push, pull, and hybrid. Push sends data to a random node; pull requests data; hybrid combines both, offering better convergence.

Simulations show pull converges faster than push, and hybrid outperforms both, scaling logarithmically with cluster size.

Eventually Consistent Data Types

Achieving a semantically correct final state across replicas is challenging. Counter CRDTs illustrate this: each node maintains separate increment and decrement arrays, merging by taking element‑wise maxima.

class Counter {
  int[] plus;
  int[] minus;
  int NODE_ID;

  void increment() { plus[NODE_ID]++; }
  void decrement() { minus[NODE_ID]++; }
  int get() { return sum(plus) - sum(minus); }
  void merge(Counter other) {
    for (int i = 0; i < MAX_ID; i++) {
      plus[i] = max(plus[i], other.plus[i]);
      minus[i] = max(minus[i], other.minus[i]);
    }
  }
}

Other CRDTs include sets, graphs, and lists, each with limited functionality and performance overhead.

Data Placement

This section discusses algorithms for mapping data items to physical nodes, handling migration, and balancing resources such as memory and disk.

Balancing Data

When clusters expand or nodes fail, seamless data migration is required. Systems like Redis Cluster use redirection commands to route keys to the correct node, with permanent redirects cached by clients and temporary redirects used during migration.

Consistent Hashing

Consistent hashing maps keys to a ring of nodes, minimizing data movement when nodes are added or removed. Virtual nodes spread load and reduce hotspot migration.

Multi‑attribute Sharding

When queries involve multiple attributes, multidimensional sharding (e.g., HyperDex) maps each attribute axis to a space partition, reducing the number of nodes involved in a query.

Replica Damping

In memory‑intensive workloads, each shard may have an in‑memory copy and a disk‑based replica. When a node fails, replicas can be loaded into memory, allowing the cluster to survive with limited extra memory.

System Coordination

We discuss two practical coordination techniques: fault detection and leader election.

Fault Detection

Heartbeat‑based detectors monitor component liveness, adapting to network delays and topology changes. Accrual detectors compute the probability of failure based on heartbeat arrival statistics, allowing configurable false‑positive rates.

Leader Election (Bully Algorithm)

The Bully algorithm elects a coordinator by having each node announce itself; nodes with higher IDs win. MongoDB uses a variant of this algorithm.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fault tolerance replication NoSQL

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.