Inside Amazon Aurora: Cloud‑Native Architecture, Write/Read Path, and Recovery
This article provides an in‑depth technical analysis of Amazon Aurora’s cloud‑native relational database design, covering its architectural models, log‑as‑database philosophy, optimized write and read paths, fault‑tolerant replication, group commit, and fast recovery mechanisms compared with traditional DBMS.
Introduction
Amazon Aurora is a cloud‑native relational database service that separates compute from storage, adopts a “log‑is‑database” model, and is built for high availability, low‑latency writes, and elastic scalability.
Database Architecture Models
Five representative clustering architectures are compared to illustrate Aurora’s design choice:
Shared Disk Failover (SDF) : a single active DB instance accesses a shared SAN; failover moves the service to another node. Replication occurs at the storage layer.
Shared Disk Parallel (SDP) : multiple instances concurrently access shared storage with distributed concurrency control (e.g., Oracle RAC).
Shared Nothing Active (SNA) : a middleware layer forwards client requests to independent replicas; reads are load‑balanced, writes are broadcast to all replicas.
Shared Nothing Certification‑Based (SNCB) : each replica executes transactions locally; a total‑order broadcast is used only at commit time.
Hybrid (Aurora‑style) : similar to SDP but all updates are performed by a single DB instance, eliminating distributed concurrency protocols.
Aurora System Design
The design provides three core advantages over traditional DBMSs:
Independent fault‑tolerant storage : a distributed storage service isolates the database from performance jitter and network/storage failures, preserving the Service Level Agreement (SLA).
Log‑is‑database philosophy : only redo logs are written to storage, reducing network IOPS by roughly an order of magnitude.
Offloaded complex functions : redo, backup, and recovery are handled asynchronously by the storage layer, enabling parallel processing and fast crash recovery.
Terminology
AZ (Availability Zone) : a data‑center within a single AWS region.
LSN (Log Sequence Number) : a unique identifier for each log record; Aurora uses timestamps instead of file offsets.
VCL (Volume Complete LSN) : the highest continuous LSN received by a storage node (may not be committed).
SCL (Segment Complete LSN) : the highest continuous LSN for a data segment, used for inter‑node synchronization.
CPL (Consistency Point LSN) : the last LSN of a Min‑Transaction; each Min‑Transaction produces a CPL.
VDL (Volume Durable LSN) : the greatest CPL that is durable; during recovery the system reads a majority of VDLs to determine the safe recovery point.
Aurora Write Path
Writes are performed by constructing redo logs only; data pages are never transmitted over the network. The process consists of eight steps (see Figure 3):
Primary instance streams redo logs to storage nodes.
Storage nodes batch the logs, persist them to local disks, and acknowledge persistence (ACK) back to the primary.
Each storage node scans its log queue to detect missing entries.
Missing entries are fetched via a gossip‑based peer‑to‑peer protocol.
Logs are merged to produce the latest version of each 10 GB data block.
Data blocks are continuously backed up to S3 using point‑in‑time snapshots.
Garbage‑collection removes obsolete blocks and log files.
Periodic integrity checks verify block health and retrieve missing replicas from peers.
Each 10 GB block is replicated six times across three AZs. A write is considered durable once at least four of the six replicas have persisted the redo log, giving a quorum of Vr = 3 (read) and Vw = 4 (write).
Aurora Read Path
Read requests first check the in‑memory buffer. If the required block is absent, the request is forwarded to the storage layer. Aurora guarantees that every buffered page’s LSN exceeds the current VDL, ensuring the page reflects the latest committed state. Two mechanisms can present a consistent view to the client:
Replay all logs with LSN ≤ VDL onto the buffered page.
Merge the buffered page with delta logs on‑the‑fly (exact method is proprietary).
A read point (the VDL at request time) is assigned, and the system uses the corresponding SCL to locate storage nodes that can satisfy the read.
Transaction Commit
Aurora uses asynchronous group commit:
Each transaction writes its commit LSN to a pending queue and continues processing.
A background thread batches pending commits and sends the logs to storage.
When the primary receives ACKs from at least four replicas for a batch, the VDL advances.
The commit queue is scanned; transactions whose commit LSN ≤ VDL are marked committed and a response is returned to the client.
Recovery Mechanism
Traditional DBMSs rely on ARIES‑style recovery (redo + undo) after a checkpoint. Aurora offloads the entire recovery subsystem to the storage layer, which performs parallel, asynchronous redo without impacting the compute tier.
During crash recovery:
The storage layer determines the maximum VCL (largest continuous LSN) and truncates logs beyond it.
Only logs with LSN ≤ CPL are replayed.
VDL is set to the greatest CPL, establishing a safe recovery point.
Empirical measurements show Aurora can restore a failed instance in roughly 60–120 seconds, significantly faster than MySQL on comparable hardware.
Performance and Fault‑Tolerance Summary
Key quantitative characteristics:
Data blocks: 10 GB each, replicated six times (two replicas per AZ across three AZs).
Write quorum: Vw = 4 (four ACKs required for durability).
Read quorum: Vr = 3 (majority of replicas needed to guarantee the latest version).
Point‑In‑Time Recovery (PITR) typically completes within 60–120 seconds.
Benchmark on r3.8xlarge instances (sysbench) shows >10× throughput compared with MySQL 5.6/5.7.
By isolating the storage service, writing only redo logs, and delegating recovery to a parallel distributed layer, Aurora achieves cloud‑scale performance, sub‑minute RPO/RTO, and strong multi‑AZ fault tolerance while remaining fully compatible with MySQL 5.6.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
