Databases 7 min read

Case Study: Resolving a One‑Cent Discrepancy Caused by Distributed‑Lock Timeout and Concurrency Issues in Alibaba’s Financial System

This article analyzes a real Alibaba internal financial‑system incident where a one‑cent accounting error arose from concurrent database writes after a distributed‑lock timeout, details the root‑cause investigation, and presents two remediation strategies—adjusting timeout settings and strengthening idempotent controls—to prevent similar issues.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Case Study: Resolving a One‑Cent Discrepancy Caused by Distributed‑Lock Timeout and Concurrency Issues in Alibaba’s Financial System

Alibaba’s technology platform supports massive events like Double 11, but occasional bugs reveal valuable lessons. This case study describes a one‑cent discrepancy reported by a client of product X, where a user’s daily earnings were recorded as 0.04 CNY instead of the expected 0.03 CNY.

Investigation of the product X database showed two settlement records for the same user on the same day, created six seconds apart. The first transaction timed out because the database connection pool was exhausted, causing the distributed lock (5 s timeout) to expire. When the lock released, both the pending first transaction and a retry‑generated second transaction obtained a connection and committed simultaneously.

Because the table lacked a unique index and the downstream payment service ignored idempotency, two distinct TXIDs were generated, resulting in duplicate accounting and the extra cent.

Further analysis highlighted three conditions that break distributed‑lock concurrency control: (1) upper‑level business logic includes retries, (2) requests may succeed after a delay (e.g., after the lock expires), and (3) downstream systems provide no additional idempotency safeguards.

Two mitigation approaches were evaluated. Solution 1 extends the transaction timeout to 10 s and the distributed‑lock timeout to 30 s, ensuring the lock remains valid throughout the retry window. Solution 2 adds strict idempotent checks at the database layer, guaranteeing that even if locks fail, duplicate writes are rejected. Both solutions were validated as effective.

The conclusion emphasizes that, for financial‑critical systems, relying solely on distributed locks is insufficient; robust idempotent mechanisms and proper timeout configurations are essential to maintain data integrity.

AlibabaDatabaseconcurrencydistributed lockIdempotencyfinancial system
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.