Recommended Industrial Distributed System Design Papers
This article curates a selection of seminal industrial distributed system design papers—including Google’s GFS, MapReduce, Bigtable, Percolator, Spanner, F1, Amazon’s Dynamo and Aurora—providing brief insights and links for readers interested in foundational storage and database technologies.
Below is a curated list of influential industrial distributed system design papers, each accompanied by a brief description and a link to the original PDF.
1. Google’s “Three Pillars”
The Google File System (2003) – PDF
MapReduce: Simplified Data Processing on Large Clusters (2004) – PDF
Bigtable: A Distributed Storage System for Structured Data (2006) – PDF
These three papers introduced concepts that later inspired Hadoop’s HDFS, MapReduce, and HBase.
2. Amazon’s Dynamo (Highly Available Key‑Value Store)
Dynamo: Amazon’s Highly Available Key‑value Store (2007) – PDF
Dynamo’s design influenced many later systems, including Amazon Aurora.
3. Google’s Percolator and Megastore
Large‑scale Incremental Processing Using Distributed Transactions and Notifications (2010) – PDF
Megastore: Providing Scalable, Highly Available Storage for Interactive Services (2011) – PDF
These works show how Google extended Bigtable with transactional capabilities before the advent of Spanner.
4. Google’s Spanner and F1
Spanner: Google’s Globally‑Distributed Database (2012) – PDF
F1: A Distributed SQL Database That Scales (2013) – PDF
Online, Asynchronous Schema Change in F1 (2013) – PDF
Spanner: Becoming a SQL System (2017) – PDF
F1 Query: Declarative Querying at Scale (2018) – PDF
Spanner focuses on the storage layer with distributed transactions, while F1 provides a distributed SQL engine; together they opened the NewSQL era, inspiring open‑source projects such as TiDB and CockroachDB.
5. Amazon’s Aurora
Aurora: Design Considerations for High Throughput Cloud‑Native Relational Databases (2017) – PDF
Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes (2018) – PDF
Aurora represents a modern cloud‑native relational database design.
These papers collectively form a solid foundation for understanding modern distributed storage and database systems in industry.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
