Fundamentals 13 min read

Mastering Distributed System Design: Core Principles Every Engineer Should Know

This article outlines essential distributed system concepts—including system decomposition, concurrency, caching strategies, online vs. offline processing, push/pull communication, load limiting, service degradation, CAP theorem, and eventual consistency—to help engineers design scalable, reliable architectures for high‑traffic applications.

ITFLY8 Architecture Home

Apr 11, 2018

Mastering Distributed System Design: Core Principles Every Engineer Should Know

In today's internet‑driven world, distributed systems are ubiquitous, powering search engines, e‑commerce sites, social platforms, and any service with massive users and high concurrency.

There is no single best architecture; each business faces unique challenges that require tailored designs.

Nevertheless, several fundamental ideas are common across distributed systems, and this article summarizes them.

Decomposition

System Decomposition

As an architect once said, "Make big systems small." Large, complex systems should be split into multiple subsystems, each with its own storage, services, and interfaces, allowing independent development, testing, deployment, and operation. Teams can use familiar languages and collaborate via well‑defined interfaces.

Subsystem Decomposition

Within a subsystem, further layering and modularization are possible. Concepts like "system," "subsystem," "layer," and "module" are relative; a complex module may become its own system, while simple modules may stay combined.

Storage Decomposition

NoSQL databases such as MongoDB are inherently distributed and support sharding easily. Relational databases like MySQL require database‑and‑table sharding, raising issues of split dimensions, join handling, and distributed transactions.

Computation Decomposition

Two approaches exist: data partitioning—splitting a large dataset into smaller chunks for parallel processing (e.g., large‑scale merge sort); and task partitioning—breaking a long task into stages that run in parallel. Frameworks like Java's Fork/Join, Hadoop's Map/Reduce, and distributed search indexing follow this pattern.

Concurrency

Increasing concurrency, often via multithreading, boosts performance—e.g., converting sequential RPC calls to asynchronous parallel calls or partitioning data to process large tables faster.

Cache

Caching is a primary performance technique; the key consideration is cache granularity. Finer granularity improves reuse but may require more lookups, while coarser granularity simplifies access but risks higher invalidation rates.

Online vs. Offline / Synchronous vs. Asynchronous

Not all workloads need real‑time results. Scenarios like internal reporting, social media feed propagation, search indexing, or payment processing can tolerate delays, allowing asynchronous processing via message queues or background jobs, and enabling read‑write separation such as MySQL master‑slave replication.

Full + Incremental

Full and incremental processing mirrors online/offline strategies—for example, full and incremental indexes in search engines, or periodic merges of small tables in OceanBase.

Push vs. Pull

State notification between nodes can be push (active notification) or pull (periodic polling). This choice impacts module responsibilities and coupling, similar to bidirectional associations in object‑oriented design.

Batch Processing

Transforming real‑time tasks into batch jobs reduces system pressure, as seen in Kafka's batch message sending or aggregating ad clicks before billing.

Write‑Heavy vs. Read‑Heavy

"Write‑light, read‑heavy" (e.g., MySQL) favors strong consistency but may suffer performance issues, whereas "write‑heavy, read‑light" (e.g., feed pre‑generation) stores results for fast reads at the cost of eventual consistency.

Read‑Write Separation

Traditional single‑node MySQL synchronously serves reads and writes, but many scenarios allow asynchronous replication, enabling master‑slave setups or OLTP/OLAP separation where online data syncs periodically to analytical systems.

Static‑Dynamic Separation

Web front‑ends often separate dynamic pages (served by web servers) from static assets (served via CDN), improving performance and reducing server load.

Hot‑Cold Separation

Historical data can be periodically moved from MySQL to Hive for long‑term storage and analysis.

Rate Limiting

During traffic spikes like flash sales, limiting request rates prevents overload, similar to controlling crowd size at popular venues.

Service Circuit Breaker and Degradation

When downstream services become unavailable, degrading them—returning default responses—keeps the primary workflow functional and maintains overall system availability.

CAP Theory

The discussed principles can be unified under the CAP theorem: Consistency (data replicas agree), Availability (service remains responsive), and Partition tolerance (network partitions are inevitable). In practice, systems balance Consistency and Availability while accepting Partition tolerance.

Examples: MySQL emphasizes Consistency, sacrificing Availability; NoSQL systems favor Partition tolerance and Availability, weakening Consistency; social feeds may relax Consistency for higher Availability.

Eventual Consistency

Distributed systems often adopt eventual consistency to mitigate the difficulty of strong consistency across partitions, typically using reliable message queues to propagate updates.

Source: https://blog.csdn.net/chunlongyu/article/details/52525604

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Microservices Scalability CAP theorem System Design

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.