Databases 27 min read

Inside Amazon Aurora: Cloud‑Native Architecture, Write/Read Path, and Recovery

This article provides an in‑depth technical analysis of Amazon Aurora’s cloud‑native relational database design, covering its architectural models, log‑as‑database philosophy, optimized write and read paths, fault‑tolerant replication, group commit, and fast recovery mechanisms compared with traditional DBMS.

dbaplus Community
dbaplus Community
dbaplus Community
Inside Amazon Aurora: Cloud‑Native Architecture, Write/Read Path, and Recovery

Introduction

Amazon Aurora is a cloud‑native relational database service that separates compute from storage, adopts a “log‑is‑database” model, and is built for high availability, low‑latency writes, and elastic scalability.

Database Architecture Models

Five representative clustering architectures are compared to illustrate Aurora’s design choice:

Shared Disk Failover (SDF) : a single active DB instance accesses a shared SAN; failover moves the service to another node. Replication occurs at the storage layer.

Shared Disk Parallel (SDP) : multiple instances concurrently access shared storage with distributed concurrency control (e.g., Oracle RAC).

Shared Nothing Active (SNA) : a middleware layer forwards client requests to independent replicas; reads are load‑balanced, writes are broadcast to all replicas.

Shared Nothing Certification‑Based (SNCB) : each replica executes transactions locally; a total‑order broadcast is used only at commit time.

Hybrid (Aurora‑style) : similar to SDP but all updates are performed by a single DB instance, eliminating distributed concurrency protocols.

Aurora System Design

The design provides three core advantages over traditional DBMSs:

Independent fault‑tolerant storage : a distributed storage service isolates the database from performance jitter and network/storage failures, preserving the Service Level Agreement (SLA).

Log‑is‑database philosophy : only redo logs are written to storage, reducing network IOPS by roughly an order of magnitude.

Offloaded complex functions : redo, backup, and recovery are handled asynchronously by the storage layer, enabling parallel processing and fast crash recovery.

Terminology

AZ (Availability Zone) : a data‑center within a single AWS region.

LSN (Log Sequence Number) : a unique identifier for each log record; Aurora uses timestamps instead of file offsets.

VCL (Volume Complete LSN) : the highest continuous LSN received by a storage node (may not be committed).

SCL (Segment Complete LSN) : the highest continuous LSN for a data segment, used for inter‑node synchronization.

CPL (Consistency Point LSN) : the last LSN of a Min‑Transaction; each Min‑Transaction produces a CPL.

VDL (Volume Durable LSN) : the greatest CPL that is durable; during recovery the system reads a majority of VDLs to determine the safe recovery point.

Aurora Write Path

Writes are performed by constructing redo logs only; data pages are never transmitted over the network. The process consists of eight steps (see Figure 3):

Primary instance streams redo logs to storage nodes.

Storage nodes batch the logs, persist them to local disks, and acknowledge persistence (ACK) back to the primary.

Each storage node scans its log queue to detect missing entries.

Missing entries are fetched via a gossip‑based peer‑to‑peer protocol.

Logs are merged to produce the latest version of each 10 GB data block.

Data blocks are continuously backed up to S3 using point‑in‑time snapshots.

Garbage‑collection removes obsolete blocks and log files.

Periodic integrity checks verify block health and retrieve missing replicas from peers.

Each 10 GB block is replicated six times across three AZs. A write is considered durable once at least four of the six replicas have persisted the redo log, giving a quorum of Vr = 3 (read) and Vw = 4 (write).

Aurora Read Path

Read requests first check the in‑memory buffer. If the required block is absent, the request is forwarded to the storage layer. Aurora guarantees that every buffered page’s LSN exceeds the current VDL, ensuring the page reflects the latest committed state. Two mechanisms can present a consistent view to the client:

Replay all logs with LSN ≤ VDL onto the buffered page.

Merge the buffered page with delta logs on‑the‑fly (exact method is proprietary).

A read point (the VDL at request time) is assigned, and the system uses the corresponding SCL to locate storage nodes that can satisfy the read.

Transaction Commit

Aurora uses asynchronous group commit:

Each transaction writes its commit LSN to a pending queue and continues processing.

A background thread batches pending commits and sends the logs to storage.

When the primary receives ACKs from at least four replicas for a batch, the VDL advances.

The commit queue is scanned; transactions whose commit LSN ≤ VDL are marked committed and a response is returned to the client.

Recovery Mechanism

Traditional DBMSs rely on ARIES‑style recovery (redo + undo) after a checkpoint. Aurora offloads the entire recovery subsystem to the storage layer, which performs parallel, asynchronous redo without impacting the compute tier.

During crash recovery:

The storage layer determines the maximum VCL (largest continuous LSN) and truncates logs beyond it.

Only logs with LSN ≤ CPL are replayed.

VDL is set to the greatest CPL, establishing a safe recovery point.

Empirical measurements show Aurora can restore a failed instance in roughly 60–120 seconds, significantly faster than MySQL on comparable hardware.

Performance and Fault‑Tolerance Summary

Key quantitative characteristics:

Data blocks: 10 GB each, replicated six times (two replicas per AZ across three AZs).

Write quorum: Vw = 4 (four ACKs required for durability).

Read quorum: Vr = 3 (majority of replicas needed to guarantee the latest version).

Point‑In‑Time Recovery (PITR) typically completes within 60–120 seconds.

Benchmark on r3.8xlarge instances (sysbench) shows >10× throughput compared with MySQL 5.6/5.7.

By isolating the storage service, writing only redo logs, and delegating recovery to a parallel distributed layer, Aurora achieves cloud‑scale performance, sub‑minute RPO/RTO, and strong multi‑AZ fault tolerance while remaining fully compatible with MySQL 5.6.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Database ArchitectureReplicationcloud databaseRecoveryAurora
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.