Mastering Distributed Systems: Common Pitfalls and How to Avoid Them
This article explains the core concepts of distributed systems—including the CAP theorem, BASE theory, message‑queue challenges, Redis sentinel issues, sharding strategies, unique ID generation, and distributed transaction patterns—while offering practical guidance to prevent common pitfalls and improve reliability.
Preface
When interviewers ask about distributed systems, candidates often wonder what they really are and why they matter.
Analogy with Naruto
Just as Naruto uses multiple shadow clones that share experiences, a distributed system consists of many nodes that cooperate and share state, but also consume more resources.
Simple Understanding of Distributed Systems
It is a way of working.
A collection of independent computers that appear as a single system to users.
Business logic is spread across different locations.
Advantages (Macro and Micro)
Macro: Service decomposition reduces coupling between modules.
Micro: Deploying services on different machines or containers increases capacity.
Problems of Distributed Systems
Higher talent cost.
Complex architecture and steep learning curve.
Increased operation, deployment, and maintenance costs.
Longer service chains make debugging harder.
Reliability, data idempotency, and ordering issues.
CAP Theorem
In a distributed system it is impossible to simultaneously guarantee Consistency, Availability, and Partition Tolerance.
Consistency : All nodes see the same latest data.
Availability : Every request receives a response, though not necessarily the latest data.
Partition Tolerance : When a network partition occurs, the system must choose between consistency and availability.
BASE Theory
BASE (Basically Available, Soft state, Eventually consistent) relaxes the strict consistency of CAP by allowing temporary inconsistency in exchange for higher availability.
Basically Available : Core functions remain usable even when some parts fail.
Soft state : Intermediate states (e.g., "payment pending") are allowed.
Eventually consistent : Data will become consistent after a delay.
1. Distributed Message Queue Pitfalls
Non‑Idempotent Consumption
Repeated consumption can cause data inconsistency. Ensure idempotency by using unique identifiers or checking existing records before processing.
Message Loss
Three main scenarios cause loss: producer failure, broker failure, and consumer crash. Solutions include transaction mechanisms, confirm mode, persistent queues, and proper acknowledgment handling.
Pitfall: If a consumer crashes before committing its offset, the same messages may be re‑consumed after restart.
Message Reordering
When multiple consumers process messages out of order, the final state can be incorrect. Solutions involve routing related messages to the same queue/partition and using ordered consumers.
Message Backlog
Backlog occurs when consumers are slow or unavailable. Mitigation strategies include scaling consumers, increasing queue instances, and temporarily redirecting traffic.
Message Expiration
Expired messages are discarded by the broker. Prevent loss by re‑publishing expired messages or handling them in a dead‑letter queue.
Queue Saturation
When a queue fills up, new messages are dropped. Clean up useless messages or increase consumer throughput to relieve pressure.
2. Distributed Cache Pitfalls (Redis)
Sentinel Mechanism
Sentinel provides high availability, but master‑node failure can lead to data loss if failover occurs before replication completes.
Asynchronous Replication
Data may be lost if the master crashes before syncing to replicas.
Split‑brain
Network partitions can cause two masters to operate simultaneously, leading to divergent data.
Mitigation
Configure min‑slaves‑to‑write to require at least one replica.
Set min‑slaves‑max‑lag to bound replication delay.
3. Sharding (Database Partitioning) Pitfalls
Horizontal vs. Vertical Splitting
Horizontal splits spread rows across databases; vertical splits separate columns. Both improve concurrency but introduce new challenges.
Expansion Issues
Adding new shards may require data migration and cause downtime.
Unique ID Generation
Global uniqueness is essential. Options include:
Database auto‑increment (not suitable for sharding).
UUID (large, unordered).
Timestamp‑based IDs (risk of collisions under high concurrency).
Snowflake‑style algorithms (trend‑increasing, high performance).
Improved Snowflake variants from Baidu (UIDGenerator) and Meituan (Leaf‑Snowflake).
4. Distributed Transaction Pitfalls
Understanding Transactions
A transaction must either complete fully or have no effect.
Common Patterns
XA (two‑phase commit) – suitable for monoliths.
TCC (try‑confirm‑cancel) – good for payment‑related scenarios.
SAGA – compensating actions for long workflows.
Reliable message + eventual consistency.
Maximum‑effort notification (retry until success).
Choosing a Solution
Payments → TCC.
Large systems with relaxed strictness → SAGA or reliable message.
Single‑node applications → XA.
Always add maximum‑effort retries as a safety net.
Conclusion
Distributed systems bring both benefits and drawbacks; whether to adopt them depends on business needs, timeline, cost, and team capability. Continued learning and careful design are essential to avoid the many pitfalls discussed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
