Fundamentals 27 min read

Mastering Distributed Systems: Common Pitfalls and How to Avoid Them

This article explains the core concepts of distributed systems—including the CAP theorem, BASE theory, message‑queue challenges, Redis sentinel issues, sharding strategies, unique ID generation, and distributed transaction patterns—while offering practical guidance to prevent common pitfalls and improve reliability.

Su San Talks Tech

Aug 5, 2024

Mastering Distributed Systems: Common Pitfalls and How to Avoid Them

Preface

When interviewers ask about distributed systems, candidates often wonder what they really are and why they matter.

Analogy with Naruto

Just as Naruto uses multiple shadow clones that share experiences, a distributed system consists of many nodes that cooperate and share state, but also consume more resources.

Simple Understanding of Distributed Systems

It is a way of working.

A collection of independent computers that appear as a single system to users.

Business logic is spread across different locations.

Advantages (Macro and Micro)

Macro: Service decomposition reduces coupling between modules.

Micro: Deploying services on different machines or containers increases capacity.

Problems of Distributed Systems

Higher talent cost.

Complex architecture and steep learning curve.

Increased operation, deployment, and maintenance costs.

Longer service chains make debugging harder.

Reliability, data idempotency, and ordering issues.

CAP Theorem

In a distributed system it is impossible to simultaneously guarantee Consistency, Availability, and Partition Tolerance.

Consistency : All nodes see the same latest data.

Availability : Every request receives a response, though not necessarily the latest data.

Partition Tolerance : When a network partition occurs, the system must choose between consistency and availability.

BASE Theory

BASE (Basically Available, Soft state, Eventually consistent) relaxes the strict consistency of CAP by allowing temporary inconsistency in exchange for higher availability.

Basically Available : Core functions remain usable even when some parts fail.

Soft state : Intermediate states (e.g., "payment pending") are allowed.

Eventually consistent : Data will become consistent after a delay.

1. Distributed Message Queue Pitfalls

Non‑Idempotent Consumption

Repeated consumption can cause data inconsistency. Ensure idempotency by using unique identifiers or checking existing records before processing.

Message Loss

Three main scenarios cause loss: producer failure, broker failure, and consumer crash. Solutions include transaction mechanisms, confirm mode, persistent queues, and proper acknowledgment handling.

Pitfall: If a consumer crashes before committing its offset, the same messages may be re‑consumed after restart.

Message Reordering

When multiple consumers process messages out of order, the final state can be incorrect. Solutions involve routing related messages to the same queue/partition and using ordered consumers.

Message Backlog

Backlog occurs when consumers are slow or unavailable. Mitigation strategies include scaling consumers, increasing queue instances, and temporarily redirecting traffic.

Message Expiration

Expired messages are discarded by the broker. Prevent loss by re‑publishing expired messages or handling them in a dead‑letter queue.

Queue Saturation

When a queue fills up, new messages are dropped. Clean up useless messages or increase consumer throughput to relieve pressure.

2. Distributed Cache Pitfalls (Redis)

Sentinel Mechanism

Sentinel provides high availability, but master‑node failure can lead to data loss if failover occurs before replication completes.

Asynchronous Replication

Data may be lost if the master crashes before syncing to replicas.

Split‑brain

Network partitions can cause two masters to operate simultaneously, leading to divergent data.

Mitigation

Configure min‑slaves‑to‑write to require at least one replica.

Set min‑slaves‑max‑lag to bound replication delay.

3. Sharding (Database Partitioning) Pitfalls

Horizontal vs. Vertical Splitting

Horizontal splits spread rows across databases; vertical splits separate columns. Both improve concurrency but introduce new challenges.

Expansion Issues

Adding new shards may require data migration and cause downtime.

Unique ID Generation

Global uniqueness is essential. Options include:

Database auto‑increment (not suitable for sharding).

UUID (large, unordered).

Timestamp‑based IDs (risk of collisions under high concurrency).

Snowflake‑style algorithms (trend‑increasing, high performance).

Improved Snowflake variants from Baidu (UIDGenerator) and Meituan (Leaf‑Snowflake).

4. Distributed Transaction Pitfalls

Understanding Transactions

A transaction must either complete fully or have no effect.

Common Patterns

XA (two‑phase commit) – suitable for monoliths.

TCC (try‑confirm‑cancel) – good for payment‑related scenarios.

SAGA – compensating actions for long workflows.

Reliable message + eventual consistency.

Maximum‑effort notification (retry until success).

Choosing a Solution

Payments → TCC.

Large systems with relaxed strictness → SAGA or reliable message.

Single‑node applications → XA.

Always add maximum‑effort retries as a safety net.

Conclusion

Distributed systems bring both benefits and drawbacks; whether to adopt them depends on business needs, timeline, cost, and team capability. Continued learning and careful design are essential to avoid the many pitfalls discussed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems CAP theorem sharding Message Queue ID generation

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.