Databases 13 min read

From Paper Tape to Cloud‑Native Distributed Databases: Evolution Overview

This article traces the history of data management from early manual and file‑system storage through relational models to modern distributed databases, covering key concepts like CAP theorem, distributed transactions, HTAP, and cloud‑native deployment trends.

ITPUB

Jan 14, 2022

From Paper Tape to Cloud‑Native Distributed Databases: Evolution Overview

Data Management Technology Emergence and Development

Before discussing distributed databases, the article reviews the origins of databases, starting from manual data handling on paper tape and cards, moving to early file‑system storage with hierarchical and network models, and then the breakthrough of the relational model in the 1970s.

Manual data management Before the 1950s data was stored on paper tape, cards, or magnetic tape, mainly for scientific calculations. Programmers designed, managed, and adapted code whenever data changed, resulting in strong coupling between data and programs.

File system data management In the 1950‑60s disks became available and data models such as hierarchical and network models emerged. Hierarchical models handled 1‑to‑1 and 1‑to‑N relationships efficiently but struggled with N‑to‑1, while network models described complex relationships at the cost of implementation difficulty.

Birth of the relational data model and DB development In the 1970s IBM researcher E.F. Codd introduced the relational model, defining tables, tuples, attributes, and keys. The 1980s saw the first commercial relational databases (Oracle, DB2, SQL Server) and later MySQL and PostgreSQL. After 2000, growing data volumes exposed the limits of single‑node databases, prompting sharding and middleware solutions.

Big Data Drives Distributed Database Emergence and Development

Birth of distributed databases

Google’s 2006 papers on GFS, BigTable, and MapReduce laid the foundation for the Hadoop ecosystem and later papers on Spanner and F1 introduced globally distributed transactions and scalable SQL services. These ideas spurred the creation of many NoSQL and distributed relational databases.

Key elements of distributed databases include:

Data scalability – the system must expand smoothly and automatically, minimizing impact on normal operations and providing automated rebalancing.

Data consistency – achieving financial‑grade ACID guarantees.

Availability – maintaining service when parts of the cluster or network fail.

Famous CAP theorem The CAP theorem states that a distributed system cannot simultaneously guarantee Consistency, Availability, and Partition tolerance. The article explains the trade‑offs with examples and introduces the BASE approach (Basically Available, Soft state, Eventual consistency) as a practical compromise.

Consistency (all nodes see the latest data)
Availability (every request receives a response, not necessarily the latest)
Partition tolerance (system continues despite network partitions)

Distributed transactions Illustrated with a cross‑region bank transfer, distributed transactions face challenges in providing full ACID guarantees. Common solutions include two‑phase commit (2PC), SAGA, and TCC. (1) 2PC – a coordinator orchestrates a Pre‑write phase and a Decision phase across participants. (2) SAGA – breaks a long transaction into short local transactions coordinated by a workflow engine; compensating actions roll back if a step fails. (3) TCC (Try/Confirm/Cancel) – each operation registers a confirm and a cancel action, ensuring that a successful Try guarantees a successful Confirm.

Distributed database development NoSQL databases more readily adopt distributed designs, while relational databases faced later adoption due to transaction constraints. Architectures can be classified as pseudo‑distributed (middleware over MySQL, e.g., Cobar, Mycat), shared‑storage distributed (separate compute and storage layers, e.g., AWS Aurora, PolarDB), or decentralized (share‑nothing, using consensus algorithms like Paxos or Raft, e.g., TiDB, OceanBase).

Future of Distributed Databases

Recent database conferences highlight a convergence of HTAP (Hybrid Transactional/Analytical Processing) and cloud‑native deployment. HTAP aims to serve both OLTP and OLAP workloads in real time, reducing the need for separate ETL pipelines.

Cloud‑native databases leverage Kubernetes (statefulsets, operators, local PVs) to run stateful workloads with high performance and scalability. Many vendors already provide operators for their distributed databases, and the trend points toward most distributed databases running in cloud environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems cloud-native CAP theorem HTAP Databases

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.