Cloud Native 32 min read

Master the Distributed Systems Knowledge Map: From SOA to MSA and Beyond

This comprehensive guide walks you through the fundamentals, design patterns, consistency models, core components, and engineering practices of modern distributed systems, helping you understand micro‑service architecture, network protocols, data management, fault tolerance, and performance optimization in cloud‑native environments.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Master the Distributed Systems Knowledge Map: From SOA to MSA and Beyond

Introduction

This article outlines the essential knowledge for building distributed systems based on a micro‑service architecture (MSA). It covers theoretical foundations, design patterns, engineering practices, deployment, and operations.

Fundamental Theory

The evolution from Service‑Oriented Architecture (SOA) to Micro‑Service Architecture (MSA) is driven by the need to decouple services and enable independent deployment. SOA typically relies on a central service bus and shared databases, which create single points of failure. MSA eliminates the bus by making each service fully independent from entry to persistence, at the cost of increased orchestration complexity.

Node and Network

Nodes have progressed from physical machines to virtual machines and finally to lightweight containers that host services.

Three network models are defined:

Synchronous network – nodes execute in lockstep, latency is bounded, and a global lock can be used.

Semi‑synchronous network – lock scope is relaxed, allowing limited asynchrony.

Asynchronous network – nodes run independently, latency is unbounded, and no global lock exists.

Time and Order

Physical clocks cannot guarantee ordering across nodes. Distributed systems therefore use protocols such as NTP, logical clocks, and vector clocks.

Logical clock update: t' = max(t, t_msg + 1) Vector clock update:

t_i' = max(t_i, t_msg_i)

Consistency Theory

Strong consistency (ACID)

Atomicity

Consistency

Isolation

Durability

CAP theorem – In a distributed system it is impossible to simultaneously guarantee Consistency, Availability, and Partition tolerance.

FLP impossibility – In an asynchronous network with unbounded delay, consensus cannot be reached in finite time if even a single node behaves maliciously.

BASE – Basically Available, Soft State, Eventual Consistency, which relaxes ACID for higher availability.

CALM principle – Consistency and Logical Monotonicity: monotonic logic leads to eventual consistency without a central coordinator.

CRDT (Conflict‑Free Replicated Data Types)

State‑based CRDT – merge states from all nodes.

Operation‑based CRDT – broadcast operations to all nodes.

Key protocols include Highly Available Transactions (HATs) and Zookeeper Atomic Broadcast (ZAB).

Core Distributed Systems

File systems

HDFS

FastDFS

Ceph

MooseFS

Databases

Column store: HBase

Document store: Elasticsearch, MongoDB

Key‑Value store: Redis

Distributed relational: Spanner

Computing frameworks

Offline batch: Hadoop

Real‑time analytics: Spark

Streaming: Storm, Flink/Blink

Cache

Persistent: Redis

Non‑persistent: Memcached

Message queues

Kafka

RabbitMQ

RocketMQ

ActiveMQ

Monitoring

Zookeeper (used for health checks and coordination)

Security mechanisms

Federated identity

Gateway‑proxy

Token‑based access control

Engineering Practices

Design Patterns

Typical patterns for distributed systems include reverse proxy, adapters, front‑back separation, resource aggregation, configuration separation, gateway aggregation, leader election, pipeline‑filter, sidecar, and static‑content CDN.

Availability

Health checks

Load balancing

Rate limiting (throttling)

Data Management

Cache

CQRS (Command Query Responsibility Segregation)

Event sourcing

Indexing

Materialized views

Sharding and partitioning

Implementation Details

Reverse proxy

Adapter layer

Front‑back separation

Resource aggregation

Configuration center

Gateway aggregation, offload, routing

Leader election

Pipeline‑filter

Sidecar deployment

Static‑content CDN

Resource Scheduling

Elastic scaling replaces manual provisioning. Automatic scaling, instance termination, and replacement of faulty nodes are essential.

Network Management

Domain name registration and updates

Load management

Outbound security filtering

Unified access control

Fault Snapshot

Capture memory distribution, thread counts (e.g., JavaDump)

Non‑intrusive bytecode debugging for production logs

Traffic Scheduling

Traffic passes through gateways. Strategies include:

Load balancers: hardware switches, F5, LVS/ALI‑LVS, Nginx/Tengine, VIPServer/ConfigServer

Gateway design: high‑performance, distributed, business filtering

Traffic management: request validation, CDN caching

Flow control: counters, queues, leaky bucket, token bucket, dynamic control

Rate‑limiting tools: Sentinel

Service Scheduling

Service registry for state detection and lifecycle management

Version management (cluster version, rollback)

Orchestration: Kubernetes, Spring Cloud, HSF, Zookeeper + Dubbo

Service control: registration, health check, degradation, circuit breaker (Hystrix), idempotency (global ID, Snowflake)

Data Scheduling

State transfer to global storage (e.g., login info in Redis)

Horizontal scaling via sharding, partitioning, replication

Automation & Operations

Configuration center (e.g., Switch, Diamend)

Deployment strategies: stop‑the‑world, rolling, blue‑green, canary, A/B testing

Job scheduling: SchedulerX, Spring scheduled tasks

Application management: restart, offline, log cleanup

Fault Tolerance

Active handling: retries (spring‑retry)

Passive handling: transaction compensation, idempotent operations

Performance Tuning

Performance optimization spans distributed lock design, high‑concurrency programming, and asynchronous event‑driven models.

Distributed lock for cache consistency

High‑concurrency patterns

Asynchronous event‑driven programming

Conclusion

Distributed systems provide scalability but introduce complexity and new failure modes. When possible, a single‑node solution should be considered first. If a distributed approach is required, the combination of Docker, Kubernetes, and Spring Cloud offers a practical foundation.

Distributed System Knowledge Map
Distributed System Knowledge Map
Distributed Technology Stack
Distributed Technology Stack
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativearchitectureMicroservicesOperationsConsistency
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.