Why Did My Payment Service Lose Data? Uncovering Hidden Transaction Bugs in Spring
A mysterious payment failure where orders appeared successful but were never persisted was traced to a missing transaction commit in a special code path, leading to polluted connections that silently broke subsequent transactions, and the article explains the root cause, debugging steps, fix, and preventive measures.
Incident Overview
The online payment service stopped persisting orders: users received a successful payment response, but the order table remained empty and occasional lock‑timeout errors were observed when modifying orders.
Root Cause
A newly deployed business method SomeService.handleSpecialCase() opened a transaction, executed an INSERT, and returned early when a special condition was met. The early return skipped the sqlSession.commit(), leaving the underlying Connection in an active (uncommitted) state.
Spring’s DataSourceTransactionManager obtains a ConnectionHolder from TransactionSynchronizationManager in doGetTransaction(). Because the previous request left the connection marked as active ( isTransactionActive() = true), the same ConnectionHolder was reused for the next request (e.g., PaymentService.createOrder()). The manager therefore considered the request to be part of an existing transaction ( isExistingTransaction() returned true) and did not create a new transaction. During commit, processCommit saw status.isNewTransaction() = false and skipped the real connection.commit(), so the order data never reached the database.
The bug manifested intermittently because TransactionSynchronizationManager stores resources in a ThreadLocal. Requests handled by a clean thread (without a polluted connection) succeeded, while those on a thread that reused the polluted connection failed.
Faulty Code
@Service
public class SomeService {
public void handleSpecialCase() {
// open transaction
sqlSession.connection.setAutoCommit(false);
// execute SQL
mapper.insert(data);
// special case: forget to commit!
if (specialCondition) {
return; // commit missed
}
sqlSession.commit();
}
}Fixed Code
@Service
public class SomeService {
public void handleSpecialCase() {
try {
sqlSession.connection.setAutoCommit(false);
mapper.insert(data);
if (specialCondition) {
sqlSession.commit(); // ensure commit even on special path
return;
}
sqlSession.commit();
} catch (Exception e) {
sqlSession.rollback();
throw e;
}
}
}Spring Transaction Flow (Key Points)
getTransaction()calls doGetTransaction() which retrieves a ConnectionHolder from TransactionSynchronizationManager. isExistingTransaction() returns true when the holder’s isTransactionActive() flag is set.
If an existing transaction is detected, Spring joins it instead of creating a new one.
During processCommit(), the actual connection.commit() is executed only when status.isNewTransaction() is true. A joined transaction therefore skips the commit.
Why It Occasionally Succeeded
Each thread has its own TransactionSynchronizationManager instance. When a request was processed on a thread that had not previously used the polluted connection, a fresh connection was obtained and the transaction committed normally.
Prevention Measures
1. Connection‑Pool Health Checks
spring:
datasource:
hikari:
connection-test-query: SELECT 1
validation-timeout: 3000
connection-init-sql: SET autocommit=1The connection-init-sql resets the connection state before it is handed out, preventing leftover transaction flags.
2. Database‑Level Monitoring
-- Find transactions running longer than 30 seconds
SELECT *
FROM information_schema.innodb_trx
WHERE TIME_TO_SEC(TIMEDIFF(NOW(), trx_started)) > 30;Alert on long‑running transactions, lock waits, and abnormal connection counts.
3. Explicit Transaction Management
Always place commit() in the try block’s final step.
Place rollback() in the catch block.
Close resources in a finally block.
4. Source‑Code Debugging
When encountering obscure behavior, set breakpoints in getTransaction and isExistingTransaction to verify whether a connection is being incorrectly reused.
Takeaways
Connection pools can propagate transaction state bugs across unrelated services.
Missing explicit commit/rollback in manually managed transactions can silently corrupt data.
Application logs may appear normal; database‑level metrics are essential for detecting hidden issues.
Debugging the framework’s transaction code often reveals hidden assumptions about connection reuse.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
