Databases 19 min read

What Exactly Is a Distributed Database? Definitions, Features, and Architecture Explained

This article defines distributed databases, examines their external traits such as write‑heavy, low‑latency, massive concurrency, massive storage and high reliability, explores internal architectures like client‑side sharding, proxy middleware and unit‑based designs, compares them with Amazon Aurora, and summarizes key takeaways.

ITPUB

Aug 7, 2023

What Exactly Is a Distributed Database? Definitions, Features, and Architecture Explained

1. What Is a Distributed DB?

With the rise of TiDB and similar systems, relational databases are gaining built‑in distribution features such as data sharding and distributed transactions, allowing developers to use the familiar JDBC interface. ShardingSphere is an example of middleware that provides standardized sharding, distributed transaction, and governance capabilities.

Fact Standard

When a technology dominates the market it becomes the de‑facto benchmark; Oracle serves that role for relational databases. Distributed databases have not yet settled on a single benchmark, so the community must define the concept themselves, looking at both internal and external perspectives.

2. External Perspective: Features

Distributed DBs aim to solve the pain points of two main workload types:

OLTP : transaction‑oriented workloads with small per‑transaction data size that require results within milliseconds (e.g., shopping, payment, transfer).

OLAP : analytics on large data sets such as annual statements or financial reports.

The article focuses on the OLTP scenario and assumes “DB” means a relational DB that supports SQL and ACID.

OLTP Characteristics

Write‑heavy, read‑light : many writes, few reads, low query complexity.

Low latency : typically < 500 ms, ideally sub‑second.

High concurrency : no theoretical upper bound on concurrent transactions.

Thus a distributed DB can be defined as a DB that serves write‑heavy, low‑latency, high‑concurrency OLTP workloads.

Massive Concurrency

Traditional single‑node relational DBs scale vertically (Scale‑Up) and are limited by the resources of one machine. Distributed DBs add machines horizontally, achieving “massive concurrency” that can exceed 10 000 TPS in practice.

High Reliability

Reliability is measured by failure‑rate statistics (e.g., Google’s “Failure Trends in a Large Disk Drive Population”) and by service‑level targets such as 5‑nine availability (99.999 %). 5‑nine availability means downtime ≤ 5.26 minutes per year (365 * 24 * 60 *). Distributed DBs use replica mechanisms rather than RAID to meet RPO = 0 and RTO < 5 minutes.

Typical industry solutions include high‑end mainframes, specialized storage (e.g., EMC Symmetrix SRDF), or distributed replication that automatically fails over across data centers.

Massive Storage

Horizontal scaling also provides virtually unlimited storage using local disks on many nodes, making “massive storage” a standard attribute of distributed DBs.

4. Internal Perspective: Architecture

Various internal designs aim to hide distribution complexity from applications:

4.1 Client‑side Sharding (Sharding‑JDBC)

A library that adds sharding and routing logic in the application layer, allowing multiple single‑node DBs to be used together.

4.2 Proxy Middleware (MyCat)

An independent process that handles routing and can provide distributed transaction support.

4.3 Unit‑Based Architecture

Each business unit runs its own instance with a dedicated single‑node DB; cross‑unit transactions rely on distributed transaction frameworks such as TCC.

In contrast, a true distributed DB abstracts these details and presents a single logical DB to the application.

5. Amazon Aurora vs. Distributed DBs

Aurora uses a “share‑storage” architecture with compute‑storage separation, presenting itself as a large monolithic DB that achieves fault tolerance through replication across availability zones. It supports sharding at the segment level but does not provide true horizontal write scaling, so it is not a full‑blown distributed DB.

Key Differences

Aurora’s storage is shared; compute nodes scale vertically.

Distributed DBs scale both compute and storage horizontally.

Aurora relies on majority‑vote writes across six replicas; distributed DBs typically use multi‑master or quorum protocols.

6. Summary

Distributed DBs are defined by six external traits: write‑heavy, low‑latency, massive concurrency, massive storage, high reliability, and relational semantics. Alternative solutions exist but usually require deeper application changes, whereas distributed DBs aim to hide complexity and provide a unified, highly available relational service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding High concurrency Reliability OLTP distributed databases

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.