Databases 29 min read

Comprehensive Overview of Data Models, Storage Engines, Transactions, Consistency, and Replication in Modern Databases

This article provides a detailed summary of database concepts including data models (relational, document, graph), storage engine architectures (page‑oriented B‑tree, log‑structured LSM), transaction mechanisms, isolation levels, distributed transaction protocols, partitioning strategies, indexing, consistency models, and consensus algorithms such as Lamport timestamps and Raft.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Comprehensive Overview of Data Models, Storage Engines, Transactions, Consistency, and Replication in Modern Databases

Databases traditionally offer strong transactional support but are limited by single‑node storage and performance, prompting the development of various distributed solutions. This summary, based on the book "Designing Data‑Intensive Applications," reviews key concepts for developers and architects.

Data Models – The article discusses the importance of data modeling, covering relational models (tables, rows, fields with strong schemas), document models (JSON‑like nested structures with read‑time schemas), and graph models (vertices and edges for highly connected data). Each model’s query language and typical use cases are outlined.

Storage Engines – Two major storage architectures are described: page‑oriented (B‑tree) engines that map data to fixed‑size pages and log‑structured (LSM) engines that append writes to immutable files and merge them later. The mechanics of B‑tree page references, write‑ahead logging, and LSM components such as memtables, SSTables, Bloom filters, and background compaction are explained.

Transactions – The article enumerates failure scenarios (hardware, software, network, concurrency) and explains ACID properties. It differentiates single‑object from multi‑object transactions, discusses isolation levels (read‑uncommitted, read‑committed, repeatable‑read/snapshot, serializable) and their trade‑offs, and introduces two‑phase commit (2PC) for distributed transactions.

Distributed Transactions & Percolator – Two‑phase commit is detailed, followed by a description of Google’s Percolator system, which implements a coordinator‑less 2PC on top of BigTable using single‑row transactions and multi‑timestamp versions to achieve snapshot isolation.

Partitioning (Sharding) – Various partitioning strategies are covered: range‑based, hash‑based, and composite‑key approaches (e.g., Cassandra). The article discusses rebalancing techniques (fixed, dynamic, node‑proportional partitions) and the importance of balanced load distribution.

Index Construction – It distinguishes local (per‑partition) and global (keyword‑based) secondary indexes, noting their impact on read/write performance and the need for distributed transactions to keep global indexes up‑to‑date.

Consistency Models – The piece surveys linearizability, sequential consistency, causal consistency, eventual consistency, and client‑centric models (monotonic read/write, read‑your‑writes, writes‑follow‑reads), explaining their guarantees and practical limitations.

Consensus Protocols – Lamport logical timestamps are introduced to order events without a global clock, leading to total‑order broadcast. The Raft consensus algorithm is summarized, highlighting its components (leader election, log replication, membership changes, safety) and its role in state‑machine replication for strongly consistent databases.

ReplicationTransactionsconsistencyDistributed Databasespartitioningconsensus algorithmsStorage EnginesData Models
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.