Comprehensive Overview of Distributed Systems and Microservice Architecture
This article provides a thorough introduction to distributed systems, covering fundamental theories such as nodes, networks, time, ordering, consistency models (CAP, ACID, BASE), design patterns, scenario classifications, engineering practices, and the full technology stack needed to build and operate microservice‑based distributed applications.
1. Questions
What are distributed systems and microservices?
Why do we need distributed systems?
What are the core theoretical foundations of distributed systems – nodes, network, time, ordering, consistency?
What design patterns exist for distributed systems?
What types of distributed systems are there?
How to implement a distributed system?
2. Keywords
Node, time, consistency, CAP, ACID, BASE, P2P, scaling, network change, load balancing, rate limiting, authentication, service discovery, orchestration, degradation, circuit breaking, idempotence, sharding, partitioning, auto‑ops, fault tolerance, full‑stack monitoring, disaster recovery, performance tuning
3. Full Summary
With the rise of mobile Internet, computing has shifted from single‑machine workloads to collaborative multi‑machine clusters. Building large, complex applications on top of distributed theory has become commonplace.
This article outlines the knowledge outline of distributed systems based on MSA (Microservice Architecture), covering foundational theory, architectural design patterns, engineering practices, deployment, operations, and industry solutions, helping readers understand the evolution from SOA to MSA and experience the process of building a complete microservice system.
4. Basic Theory
4.1 Evolution from SOA to MSA
SOA (Service‑Oriented Architecture)
When business grows, services need to be decoupled and split into logical subsystems communicating via interfaces. Early implementations used a shared bus and database, leading to single‑point failures; thus more independent designs emerged.
MSA (Microservice Architecture)
Microservices are truly independent services from entry to persistence layer, eliminating the need for a service bus but increasing the complexity of building and managing the system. They require orchestration and a complete ecosystem of tools to support governance.
4.2 Nodes and Network
Node
Traditional nodes were single physical machines hosting all services and databases. With virtualization, a physical machine can host multiple VMs, and with containers, a node becomes a lightweight container service—essentially a logical compute resource that provides a unit of service.
Network
The foundation of distributed architecture is the network. Different network models affect message ordering, loss, and latency.
Synchronous Network
Nodes execute synchronously
Message delay is bounded
Global lock is efficient
Half‑Synchronous Network
Lock scope is relaxed
Asynchronous Network
Nodes execute independently
Message delay is unbounded
No global lock
Some algorithms become infeasible
Common Transport‑Layer Protocols
TCP
Reliable despite being slower than alternatives
Handles duplication and out‑of‑order delivery
UDP
Constant data stream
Packet loss is tolerable
4.3 Time and Ordering
Time
In a distributed world, coordinating the order of events across nodes is painful because each node has its own clock. Protocols like NTP attempt to synchronize clocks but have limitations, leading to logical clocks and vector clocks.
NTP’s shortcomings cannot fully satisfy coordination in distributed concurrent tasks.
Node clocks are unsynchronized
Hardware clock drift
Thread sleep
OS sleep
Hardware sleep
Logical Clock
Defines event ordering
t’ = max(t, t_msg + 1)
Vector Clock
t_i’ = max(t_i, t_msg_i)
Atomic Clock
Ordering
With time measurement tools, ordering problems are naturally solved; ordering is a core concept of consistency theory.
4.4 Consistency Theory
Comparison of consistency strength impact on system design:
The diagram compares transaction performance, errors, and latency under different consistency algorithms.
Strong Consistency (ACID)
Atomicity
Consistency
Isolation
Durability
Distributed Consistency (CAP)
In distributed environments, you cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance.
CAP
FLP
DLS (partial sync, Byzantine tolerance, etc.)
Weak Consistency (BASE)
Many applications tolerate eventual consistency, leading to the BASE model: Basically Available, Soft State, Eventual Consistency.
Basically Available
Soft State
Eventual Consistency
Consistency Algorithms
Algorithms such as Paxos, Raft, and Gossip are essential for achieving agreement across unreliable networks.
Paxos
Raft
Gossip
CRDT (Conflict‑Free Replicated Data Types)
Two approaches: state‑based (merge states) and operation‑based (propagate operations).
State‑based
Operation‑based
Other Protocols
HATs (Highly Available Transactions), ZAB (Zookeeper Atomic Broadcast) are also important.
5. Scenario Classification
5.1 File Systems
From early NFS to modern systems like GFS, HDFS, FastDFS, Ceph, MooseFS.
HDFS
FastDFS
Ceph
MooseFS
5.2 Databases
Relational databases struggle with distributed transactions; NoSQL provides eventual consistency.
Column store: HBase
Document store: Elasticsearch, MongoDB
KV store: Redis
Distributed RDBMS: Spanner
5.3 Compute
Distributed computing builds on distributed storage for offline, real‑time, and stream processing.
Offline: Hadoop
Real‑time: Spark
Streaming: Storm, Flink/Blink
5.4 Cache
Distributed caches like Redis improve performance but raise consistency challenges.
Persistent: Redis
Non‑persistent: Memcached
5.5 Messaging
Message queues (Kafka, RabbitMQ, RocketMQ, ActiveMQ) decouple asynchronous workloads.
Kafka
RabbitMQ
RocketMQ
ActiveMQ
5.6 Monitoring
Zookeeper is often used for health checks and coordination.
Zookeeper
5.7 Application Protocols
Services communicate via RPC or HTTP (e.g., HSF, Dubble).
HSF
Dubble
5.8 Logging
Log collection (Flume), storage (Elasticsearch/Solr, SLS), and tracing (Zipkin) are essential for fault diagnosis.
Flume
Elasticsearch/Solr, SLS
Zipkin
5.9 Ledger
Blockchain provides truly decentralized systems without a central node (e.g., Bitcoin, Ethereum).
Bitcoin
Ethereum
6. Design Patterns
6.1 Availability
Health checks
Load balancing
Rate limiting
6.2 Data Management
Cache
CQRS
Event sourcing
Indexing
Materialized views
Sharding/partitioning
6.3 Design & Implementation
Reverse proxy
Adapter layer
Frontend‑backend separation
Compute resource consolidation
Configuration separation
Gateway aggregation, offloading, routing
Leader election
Pipelines & filters
Sidecar pattern
Static content CDN
6.4 Messaging
Competitive consumers
Priority queues
6.5 Management & Monitoring
Expose runtime metrics for external monitoring and dynamic scaling.
6.6 Performance & Scaling
Design for horizontal scaling of compute, storage, and messaging.
6.7 Resilience
Isolation
Circuit breaking
Compensating transactions
Health checks
Retry
6.8 Security
Federated identity
Gateway (proxy) authentication
Customer‑provided keys/tokens
7. Engineering Application
7.1 Resource Scheduling
From physical servers to VMs to containers, DevOps enables elastic resource allocation.
Elastic Scaling
Automatic scaling up/down based on traffic
Machine decommissioning
Machine replacement for failures
Network Management
Domain name acquisition and changes
Load management
Security outbound controls
Unified access platform
Fault Snapshot
Capture system state (memory, threads, Java dumps) for post‑mortem analysis.
Debug Injection
Bytecode‑level non‑intrusive debugging in production.
7.2 Traffic Scheduling
Gateways handle high‑volume traffic; load balancing, request validation, caching, and rate limiting protect the system.
Load Balancing
Hardware (switches, F5)
Software (LVS/ALI‑LVS, Nginx/Tengine, VIPServer/ConfigServer)
Gateway Design
High performance (million+ QPS)
Distributed for resilience
Business filtering to drop malicious traffic
Traffic Management
Request authentication
Data caching via CDN
Flow Control
Algorithms: counter, queue, leaky bucket, token bucket, dynamic control
Limits: QPS, thread count, RT thresholds, Sentinel
7.3 Service Scheduling
Service Registry
Registers services for health checking and discovery.
State types (up/down)
Lifecycle management
Version Management
Cluster versioning for coordinated rollbacks
Service Orchestration
Kubernetes or Spring Cloud coordinates service dependencies.
K8s
Spring Cloud (HSF, ZK+Dubble)
Service Control
Discovery, health checks, and gateway registration.
Degradation
When traffic spikes, limit non‑critical features, relax consistency, or simplify functionality.
Circuit Breaking
Closed, half‑open, open states
Hystrix
Idempotence
Global unique IDs (Snowflake) ensure repeatable operations.
Global ID
Snowflake
7.4 Data Scheduling
State Transfer
Store session state centrally (e.g., Redis) to make requests stateless.
Sharding & Partitioning
Horizontal data scaling and redundancy.
Database Splitting
Horizontal partitioning for scalability.
7.5 Automated Operations
Configuration Center
Centralized config per environment (e.g., Switch, Diamend).
Deployment Strategies
Stop‑the‑world
Rolling
Blue‑Green
Canary
A/B testing
Job Scheduling
SchedulerX
Spring scheduled tasks
Application Management
Restart
Shutdown
Log cleanup
7.6 Fault Tolerance
Retry Design
Define retry count and interval; use Spring‑retry for implementation.
Compensating Transactions
Follow eventual consistency; lock resources with timeouts, execute only after acquiring all locks.
7.7 Full‑Stack Monitoring
Infrastructure Layer
Monitor CPU, I/O, memory, threads, throughput.
Middleware Layer
Health of middlewares must be observed.
Application Layer
Performance metrics (QPS, RT) and upstream/downstream dependencies
Business monitoring with alerts
Tracing
Zipkin / EagleEye
SLS
GOC
Alimonitor
7.8 Fault Recovery
Application Rollback
Preserve fault snapshots before rollback.
Baseline Revert
Revert code to previous version.
Version Rollback
Rollback whole cluster via version number.
7.9 Performance Tuning
Distributed Locks
Locks are required to keep cache consistency.
High Concurrency
Multi‑threading increases throughput but adds complexity.
Asynchronous Programming
Event‑driven async models improve responsiveness.
8. Conclusion
Whenever possible, prefer a single‑node solution over a distributed one because distributed systems introduce failure modes, require redundancy, and demand extensive engineering effort. In the microservice era, most foundational work is provided by Docker, Kubernetes, and Spring Cloud, enabling rapid construction of distributed architectures.
Distributed architecture core technology diagram:
Middleware stack used in distributed systems:
Final knowledge‑map of distributed systems:
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.