How a 0.01‑Yuan Mistake Exposed Distributed Lock Flaws in Alibaba’s Backend
A tiny 0.01‑yuan discrepancy in an Alibaba product revealed duplicate settlement records, exposing a distributed‑lock timeout and concurrency issue that led to double commits, and the article walks through the root‑cause analysis, reverse engineering of the process, and two remediation strategies focusing on timeout adjustments and idempotency controls.
Background
Alibaba’s e‑commerce platform faces strict quality and continuity requirements; a minor 0.01‑yuan error in product X’s daily earnings caused a mismatch where a user received 0.04 yuan instead of the expected 0.03 yuan, resulting in a loss for the company.
Database Record Analysis
Investigation of the product X earnings table showed two transaction records for the same user on the same day: one created at 08:00:23 and the other at 08:00:29, both modified at 08:00:29, indicating duplicate settlement.
Process Reverse Engineering
The reverse‑engineered timeline shows:
Database connections were exhausted, causing the first transaction to wait.
The distributed lock timed out after 5 seconds, while the first transaction took 6 seconds to complete, acquiring a DB connection after the lock expired.
Both transactions eventually committed, leading to duplicate earnings.
Deep Analysis
The root cause lies in system A’s transaction waiting longer than the business retry interval; the retry occurs while the transaction is still pending, and the distributed lock expires before the transaction completes, allowing both transactions to commit.
Proposed Solutions
Solution 1: Adjust Timeout Settings – Increase the transaction retry timeout to 10 seconds and the distributed‑lock timeout to 30 seconds, ensuring the lock remains valid throughout the transaction.
Solution 2: Add Idempotency Controls (Recommended) – Enforce strict idempotency at the database level when inserting records, guaranteeing that duplicate submissions cannot succeed regardless of timeout settings.
Conclusion
Fund safety hinges on robust idempotency; a distributed lock alone cannot guarantee concurrency control. Implementing a unique‑key or idempotency layer as a fallback ensures that even with aggressive retries, duplicate financial transactions are prevented.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
