Databases 13 min read

What Will the Third‑Generation Distributed Database Look Like? Key Features and Challenges

The article reviews 70 years of database evolution, outlines the two historic generations of distributed databases, highlights unresolved issues of correctness, performance and usability, and proposes a forward‑looking “third‑generation” vision that seeks 100 % data integrity, top speed, and true ease of use.

ITPUB
ITPUB
ITPUB
What Will the Third‑Generation Distributed Database Look Like? Key Features and Challenges

Overview

In August 2023 DTCC presented a talk titled “Third‑Generation Distributed Databases”. The presentation surveyed the evolution of database technology, reflected on the sources of innovation, and identified three fundamental characteristics that a next‑generation distributed database must satisfy.

Historical Background

First Generation – Research Era (1960s – late 1990s)

During this period the focus was on defining concepts and building prototypes. Key techniques included:

Distributed query processing

Two‑phase commit (2PC) for atomic cross‑node transactions

Early consistency mechanisms

Although 2PC provided atomicity, many correctness anomalies, scalability limits, and high‑availability guarantees remained unsolved.

Second Generation – Big‑Data Era (late 1990s – early 2020s)

Rapid data growth drove practical adoption of distributed databases. Landmark advances were:

Paxos consensus protocol, enabling strong consistency across replicated replicas.

CAP theorem , clarifying the trade‑off between Consistency, Availability, and Partition tolerance.

Google Spanner (2012), which combined Paxos and true global consistency to achieve strict serializability in production.

These innovations delivered cross‑region fault tolerance and strong consistency, yet they did not fully resolve data‑correctness, extreme performance, or ease‑of‑use challenges.

Open Questions for the Third Generation

1. Data Correctness

Current distributed DBMS still exhibit unresolved anomalies. The literature reports more than 30 distinct data anomalies (see Li et al., 2022). Existing isolation levels (e.g., Read‑Committed) often allow anomalies, and developers resort to explicit locks ( SELECT … FOR UPDATE) that compromise performance.

2. Performance vs. Correctness

Practitioners frequently lower isolation levels to improve latency, then add manual locking to preserve correctness. Benchmarks such as TPC‑C and YCSB show that user‑added locks can degrade throughput by an order of magnitude. The tension between strong consistency models (CAP, PACELC) and low latency remains unresolved, especially for transaction‑level high availability in HTAP workloads.

User‑added locks cause severe performance drop
User‑added locks cause severe performance drop

3. Usability

Operating a modern distributed database still requires deep expertise:

DBA and operations staff for deployment, scaling, and monitoring.

Schema design knowledge (ER modeling, normalization).

Understanding of isolation levels, anomaly patterns, and when to apply explicit locking.

Complex SQL tuning and parameter configuration.

These requirements raise operational costs and hinder broader adoption.

Proposed Direction for the Third Generation

The authors advocate a “predict‑then‑build” methodology: first define a concrete target set of features, then engineer the system to achieve them. The three target attributes are:

100 % data correctness – eliminate known anomalies and provide serializable isolation without manual locking.

Maximum performance – sustain benchmark‑level throughput (e.g., TPC‑C, YCSB) while maintaining strong consistency.

True ease of use – reduce the need for specialized DBA skills, automate schema evolution, and expose simple APIs.

Future work will detail concrete design choices, such as consensus algorithms, transaction processing pipelines, and developer‑friendly interfaces that embody these goals.

Reference

[1] Li Haixiang, Li Xiaoyan, Liu Chang, Du Xiaoyong, Lu Wei, Pan Anqun. “Systematic Definition and Classification of Data Anomalies in DBMS”. Software Journal , 2022, 33(3): 0.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceCAP theoremdatabase evolutiondistributed databasesUsabilitydata correctness
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.