How a 0.01‑Yuan Mistake Exposed Distributed Lock Flaws in Alibaba’s Backend

A tiny 0.01‑yuan discrepancy in an Alibaba product revealed duplicate settlement records, exposing a distributed‑lock timeout and concurrency issue that led to double commits, and the article walks through the root‑cause analysis, reverse engineering of the process, and two remediation strategies focusing on timeout adjustments and idempotency controls.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How a 0.01‑Yuan Mistake Exposed Distributed Lock Flaws in Alibaba’s Backend

Background

Alibaba’s e‑commerce platform faces strict quality and continuity requirements; a minor 0.01‑yuan error in product X’s daily earnings caused a mismatch where a user received 0.04 yuan instead of the expected 0.03 yuan, resulting in a loss for the company.

Database Record Analysis

Investigation of the product X earnings table showed two transaction records for the same user on the same day: one created at 08:00:23 and the other at 08:00:29, both modified at 08:00:29, indicating duplicate settlement.

Process Reverse Engineering

The reverse‑engineered timeline shows:

Database connections were exhausted, causing the first transaction to wait.

The distributed lock timed out after 5 seconds, while the first transaction took 6 seconds to complete, acquiring a DB connection after the lock expired.

Both transactions eventually committed, leading to duplicate earnings.

Deep Analysis

The root cause lies in system A’s transaction waiting longer than the business retry interval; the retry occurs while the transaction is still pending, and the distributed lock expires before the transaction completes, allowing both transactions to commit.

Proposed Solutions

Solution 1: Adjust Timeout Settings – Increase the transaction retry timeout to 10 seconds and the distributed‑lock timeout to 30 seconds, ensuring the lock remains valid throughout the transaction.

Solution 2: Add Idempotency Controls (Recommended) – Enforce strict idempotency at the database level when inserting records, guaranteeing that duplicate submissions cannot succeed regardless of timeout settings.

Conclusion

Fund safety hinges on robust idempotency; a distributed lock alone cannot guarantee concurrency control. Implementing a unique‑key or idempotency layer as a fallback ensures that even with aggressive retries, duplicate financial transactions are prevented.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaconcurrencyDistributed LockIdempotency
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.