Fundamentals 37 min read

Comprehensive Overview of Distributed Systems and Microservice Architecture

This article provides a thorough introduction to distributed systems, covering fundamental theories such as nodes, networks, time, ordering, consistency models (CAP, ACID, BASE), design patterns, scenario classifications, engineering practices, and the full technology stack needed to build and operate microservice‑based distributed applications.

Java Captain
Java Captain
Java Captain
Comprehensive Overview of Distributed Systems and Microservice Architecture

1. Questions

What are distributed systems and microservices?

Why do we need distributed systems?

What are the core theoretical foundations of distributed systems – nodes, network, time, ordering, consistency?

What design patterns exist for distributed systems?

What types of distributed systems are there?

How to implement a distributed system?

2. Keywords

Node, time, consistency, CAP, ACID, BASE, P2P, scaling, network change, load balancing, rate limiting, authentication, service discovery, orchestration, degradation, circuit breaking, idempotence, sharding, partitioning, auto‑ops, fault tolerance, full‑stack monitoring, disaster recovery, performance tuning

3. Full Summary

With the rise of mobile Internet, computing has shifted from single‑machine workloads to collaborative multi‑machine clusters. Building large, complex applications on top of distributed theory has become commonplace.

This article outlines the knowledge outline of distributed systems based on MSA (Microservice Architecture), covering foundational theory, architectural design patterns, engineering practices, deployment, operations, and industry solutions, helping readers understand the evolution from SOA to MSA and experience the process of building a complete microservice system.

4. Basic Theory

4.1 Evolution from SOA to MSA

SOA (Service‑Oriented Architecture)

When business grows, services need to be decoupled and split into logical subsystems communicating via interfaces. Early implementations used a shared bus and database, leading to single‑point failures; thus more independent designs emerged.

MSA (Microservice Architecture)

Microservices are truly independent services from entry to persistence layer, eliminating the need for a service bus but increasing the complexity of building and managing the system. They require orchestration and a complete ecosystem of tools to support governance.

4.2 Nodes and Network

Node

Traditional nodes were single physical machines hosting all services and databases. With virtualization, a physical machine can host multiple VMs, and with containers, a node becomes a lightweight container service—essentially a logical compute resource that provides a unit of service.

Network

The foundation of distributed architecture is the network. Different network models affect message ordering, loss, and latency.

Synchronous Network

Nodes execute synchronously

Message delay is bounded

Global lock is efficient

Half‑Synchronous Network

Lock scope is relaxed

Asynchronous Network

Nodes execute independently

Message delay is unbounded

No global lock

Some algorithms become infeasible

Common Transport‑Layer Protocols

TCP

Reliable despite being slower than alternatives

Handles duplication and out‑of‑order delivery

UDP

Constant data stream

Packet loss is tolerable

4.3 Time and Ordering

Time

In a distributed world, coordinating the order of events across nodes is painful because each node has its own clock. Protocols like NTP attempt to synchronize clocks but have limitations, leading to logical clocks and vector clocks.

NTP’s shortcomings cannot fully satisfy coordination in distributed concurrent tasks.

Node clocks are unsynchronized

Hardware clock drift

Thread sleep

OS sleep

Hardware sleep

Logical Clock

Defines event ordering

t’ = max(t, t_msg + 1)

Vector Clock

t_i’ = max(t_i, t_msg_i)

Atomic Clock

Ordering

With time measurement tools, ordering problems are naturally solved; ordering is a core concept of consistency theory.

4.4 Consistency Theory

Comparison of consistency strength impact on system design:

The diagram compares transaction performance, errors, and latency under different consistency algorithms.

Strong Consistency (ACID)

Atomicity

Consistency

Isolation

Durability

Distributed Consistency (CAP)

In distributed environments, you cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance.

CAP

FLP

DLS (partial sync, Byzantine tolerance, etc.)

Weak Consistency (BASE)

Many applications tolerate eventual consistency, leading to the BASE model: Basically Available, Soft State, Eventual Consistency.

Basically Available

Soft State

Eventual Consistency

Consistency Algorithms

Algorithms such as Paxos, Raft, and Gossip are essential for achieving agreement across unreliable networks.

Paxos

Raft

Gossip

CRDT (Conflict‑Free Replicated Data Types)

Two approaches: state‑based (merge states) and operation‑based (propagate operations).

State‑based

Operation‑based

Other Protocols

HATs (Highly Available Transactions), ZAB (Zookeeper Atomic Broadcast) are also important.

5. Scenario Classification

5.1 File Systems

From early NFS to modern systems like GFS, HDFS, FastDFS, Ceph, MooseFS.

HDFS

FastDFS

Ceph

MooseFS

5.2 Databases

Relational databases struggle with distributed transactions; NoSQL provides eventual consistency.

Column store: HBase

Document store: Elasticsearch, MongoDB

KV store: Redis

Distributed RDBMS: Spanner

5.3 Compute

Distributed computing builds on distributed storage for offline, real‑time, and stream processing.

Offline: Hadoop

Real‑time: Spark

Streaming: Storm, Flink/Blink

5.4 Cache

Distributed caches like Redis improve performance but raise consistency challenges.

Persistent: Redis

Non‑persistent: Memcached

5.5 Messaging

Message queues (Kafka, RabbitMQ, RocketMQ, ActiveMQ) decouple asynchronous workloads.

Kafka

RabbitMQ

RocketMQ

ActiveMQ

5.6 Monitoring

Zookeeper is often used for health checks and coordination.

Zookeeper

5.7 Application Protocols

Services communicate via RPC or HTTP (e.g., HSF, Dubble).

HSF

Dubble

5.8 Logging

Log collection (Flume), storage (Elasticsearch/Solr, SLS), and tracing (Zipkin) are essential for fault diagnosis.

Flume

Elasticsearch/Solr, SLS

Zipkin

5.9 Ledger

Blockchain provides truly decentralized systems without a central node (e.g., Bitcoin, Ethereum).

Bitcoin

Ethereum

6. Design Patterns

6.1 Availability

Health checks

Load balancing

Rate limiting

6.2 Data Management

Cache

CQRS

Event sourcing

Indexing

Materialized views

Sharding/partitioning

6.3 Design & Implementation

Reverse proxy

Adapter layer

Frontend‑backend separation

Compute resource consolidation

Configuration separation

Gateway aggregation, offloading, routing

Leader election

Pipelines & filters

Sidecar pattern

Static content CDN

6.4 Messaging

Competitive consumers

Priority queues

6.5 Management & Monitoring

Expose runtime metrics for external monitoring and dynamic scaling.

6.6 Performance & Scaling

Design for horizontal scaling of compute, storage, and messaging.

6.7 Resilience

Isolation

Circuit breaking

Compensating transactions

Health checks

Retry

6.8 Security

Federated identity

Gateway (proxy) authentication

Customer‑provided keys/tokens

7. Engineering Application

7.1 Resource Scheduling

From physical servers to VMs to containers, DevOps enables elastic resource allocation.

Elastic Scaling

Automatic scaling up/down based on traffic

Machine decommissioning

Machine replacement for failures

Network Management

Domain name acquisition and changes

Load management

Security outbound controls

Unified access platform

Fault Snapshot

Capture system state (memory, threads, Java dumps) for post‑mortem analysis.

Debug Injection

Bytecode‑level non‑intrusive debugging in production.

7.2 Traffic Scheduling

Gateways handle high‑volume traffic; load balancing, request validation, caching, and rate limiting protect the system.

Load Balancing

Hardware (switches, F5)

Software (LVS/ALI‑LVS, Nginx/Tengine, VIPServer/ConfigServer)

Gateway Design

High performance (million+ QPS)

Distributed for resilience

Business filtering to drop malicious traffic

Traffic Management

Request authentication

Data caching via CDN

Flow Control

Algorithms: counter, queue, leaky bucket, token bucket, dynamic control

Limits: QPS, thread count, RT thresholds, Sentinel

7.3 Service Scheduling

Service Registry

Registers services for health checking and discovery.

State types (up/down)

Lifecycle management

Version Management

Cluster versioning for coordinated rollbacks

Service Orchestration

Kubernetes or Spring Cloud coordinates service dependencies.

K8s

Spring Cloud (HSF, ZK+Dubble)

Service Control

Discovery, health checks, and gateway registration.

Degradation

When traffic spikes, limit non‑critical features, relax consistency, or simplify functionality.

Circuit Breaking

Closed, half‑open, open states

Hystrix

Idempotence

Global unique IDs (Snowflake) ensure repeatable operations.

Global ID

Snowflake

7.4 Data Scheduling

State Transfer

Store session state centrally (e.g., Redis) to make requests stateless.

Sharding & Partitioning

Horizontal data scaling and redundancy.

Database Splitting

Horizontal partitioning for scalability.

7.5 Automated Operations

Configuration Center

Centralized config per environment (e.g., Switch, Diamend).

Deployment Strategies

Stop‑the‑world

Rolling

Blue‑Green

Canary

A/B testing

Job Scheduling

SchedulerX

Spring scheduled tasks

Application Management

Restart

Shutdown

Log cleanup

7.6 Fault Tolerance

Retry Design

Define retry count and interval; use Spring‑retry for implementation.

Compensating Transactions

Follow eventual consistency; lock resources with timeouts, execute only after acquiring all locks.

7.7 Full‑Stack Monitoring

Infrastructure Layer

Monitor CPU, I/O, memory, threads, throughput.

Middleware Layer

Health of middlewares must be observed.

Application Layer

Performance metrics (QPS, RT) and upstream/downstream dependencies

Business monitoring with alerts

Tracing

Zipkin / EagleEye

SLS

GOC

Alimonitor

7.8 Fault Recovery

Application Rollback

Preserve fault snapshots before rollback.

Baseline Revert

Revert code to previous version.

Version Rollback

Rollback whole cluster via version number.

7.9 Performance Tuning

Distributed Locks

Locks are required to keep cache consistency.

High Concurrency

Multi‑threading increases throughput but adds complexity.

Asynchronous Programming

Event‑driven async models improve responsiveness.

8. Conclusion

Whenever possible, prefer a single‑node solution over a distributed one because distributed systems introduce failure modes, require redundancy, and demand extensive engineering effort. In the microservice era, most foundational work is provided by Docker, Kubernetes, and Spring Cloud, enabling rapid construction of distributed architectures.

Distributed architecture core technology diagram:

Middleware stack used in distributed systems:

Final knowledge‑map of distributed systems:

distributed systemsarchitecturemicroservicesscalabilityconsistency
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.