Databases 6 min read

Recommended Industrial Distributed System Design Papers

This article curates a selection of seminal industrial distributed system design papers—including Google’s GFS, MapReduce, Bigtable, Percolator, Spanner, F1, Amazon’s Dynamo and Aurora—providing brief insights and links for readers interested in foundational storage and database technologies.

Architecture Digest
Architecture Digest
Architecture Digest
Recommended Industrial Distributed System Design Papers

Below is a curated list of influential industrial distributed system design papers, each accompanied by a brief description and a link to the original PDF.

1. Google’s “Three Pillars”

The Google File System (2003) – PDF

MapReduce: Simplified Data Processing on Large Clusters (2004) – PDF

Bigtable: A Distributed Storage System for Structured Data (2006) – PDF

These three papers introduced concepts that later inspired Hadoop’s HDFS, MapReduce, and HBase.

2. Amazon’s Dynamo (Highly Available Key‑Value Store)

Dynamo: Amazon’s Highly Available Key‑value Store (2007) – PDF

Dynamo’s design influenced many later systems, including Amazon Aurora.

3. Google’s Percolator and Megastore

Large‑scale Incremental Processing Using Distributed Transactions and Notifications (2010) – PDF

Megastore: Providing Scalable, Highly Available Storage for Interactive Services (2011) – PDF

These works show how Google extended Bigtable with transactional capabilities before the advent of Spanner.

4. Google’s Spanner and F1

Spanner: Google’s Globally‑Distributed Database (2012) – PDF

F1: A Distributed SQL Database That Scales (2013) – PDF

Online, Asynchronous Schema Change in F1 (2013) – PDF

Spanner: Becoming a SQL System (2017) – PDF

F1 Query: Declarative Querying at Scale (2018) – PDF

Spanner focuses on the storage layer with distributed transactions, while F1 provides a distributed SQL engine; together they opened the NewSQL era, inspiring open‑source projects such as TiDB and CockroachDB.

5. Amazon’s Aurora

Aurora: Design Considerations for High Throughput Cloud‑Native Relational Databases (2017) – PDF

Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes (2018) – PDF

Aurora represents a modern cloud‑native relational database design.

These papers collectively form a solid foundation for understanding modern distributed storage and database systems in industry.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativestorage
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.