Why Did a Forgotten Transaction Block Payments? A Deep Dive into Spring’s Transaction Management
An online payment service suffered invisible data loss and lock timeouts because a newly deployed business branch failed to commit its transaction, leading to polluted connections that were reused by other services, and the article explains the root cause, debugging steps, code fixes, and preventive measures.
Incident Overview
During a production release the payment service stopped persisting orders. Users received successful payment responses, but no rows were inserted into the order table and occasional lock‑timeout errors appeared. The DBA observed several uncommitted transactions holding locks on the order rows.
Root Cause
A newly deployed business interface opened a transaction, executed an INSERT, and returned early without calling commit(). The ConnectionHolder inside Spring’s TransactionSynchronizationManager remained marked as isTransactionActive() = true. When the method finished the connection was returned to the pool still polluted with an active transaction flag.
Later, PaymentService.createOrder obtained the same physical connection from the pool. Spring’s DataSourceTransactionManager.getTransaction called isExistingTransaction, saw the active flag and treated the request as joining an existing transaction. Because the request was not the transaction initiator, processCommit skipped the real connection.commit(), so the order data never reached the database.
Faulty Code Example
@Service
public class SomeService {
public void handleSpecialCase() {
// open transaction
sqlSession.connection.setAutoCommit(false);
// execute SQL
mapper.insert(data);
// special branch – missing commit!
if (specialCondition) {
return; // commit never executed
}
sqlSession.commit();
}
}Corrected Implementation
@Service
public class SomeService {
public void handleSpecialCase() {
try {
sqlSession.connection.setAutoCommit(false);
mapper.insert(data);
if (specialCondition) {
sqlSession.commit(); // ensure commit before return
return;
}
sqlSession.commit();
} catch (Exception e) {
sqlSession.rollback();
throw e;
}
}
}Spring Transaction Mechanics
Every request passes through DataSourceTransactionManager.getTransaction. The method obtains a DataSourceTransactionObject via doGetTransaction(), which retrieves the current ConnectionHolder from TransactionSynchronizationManager. If the holder already contains a connection with isTransactionActive() true, isExistingTransaction returns true and the manager reuses the same connection instead of creating a new one.
protected Object doGetTransaction() {
DataSourceTransactionObject txObject = new DataSourceTransactionObject();
ConnectionHolder conHolder = (ConnectionHolder) TransactionSynchronizationManager.getResource(this.obtainDataSource());
txObject.setConnectionHolder(conHolder, false);
return txObject;
}
protected boolean isExistingTransaction(Object transaction) {
DataSourceTransactionObject txObject = (DataSourceTransactionObject) transaction;
return txObject.hasConnectionHolder() && txObject.getConnectionHolder().isTransactionActive();
}During commit, processCommit only calls doCommit when status.isNewTransaction() is true. If the transaction was merely joined, the actual connection.commit() is omitted.
if (status.isNewTransaction()) {
// real DB commit
doCommit(status);
} else {
// joined transaction – nothing to commit
}Why the Bug Was Intermittent
TransactionSynchronizationManageris backed by a ThreadLocal. Threads that pick a clean connection (no active flag) commit normally. Threads that reuse a polluted connection inherit the stale transaction state, causing the commit to be skipped. This explains the non‑deterministic success rate.
Prevention Measures
Reset connection state on checkout – configure the pool to test and reinitialize connections. Example for HikariCP:
spring:
datasource:
hikari:
connection-test-query: SELECT 1
validation-timeout: 3000
connection-init-sql: SET autocommit=1Monitor long‑running transactions – alert when a transaction exceeds a threshold (e.g., 30 s). Example query for MySQL InnoDB:
SELECT * FROM information_schema.innodb_trx
WHERE TIME_TO_SEC(TIMEDIFF(NOW(), trx_started)) > 30;Database‑level health checks – regularly collect metrics such as slow queries, lock waits, long‑running transactions and connection count.
Key Takeaways
Connection pools are not only performance optimizations; they can propagate transaction state if a connection is returned while still marked as active.
Always define transaction boundaries explicitly: commit in the normal flow, rollback in the error path, and clean up resources in a finally block.
Application logs may show a successful commit, but the underlying DB may still hold an uncommitted transaction. Monitor DB metrics directly.
When facing obscure bugs, set breakpoints inside Spring’s transaction manager ( getTransaction, isExistingTransaction) to inspect the actual connection state.
Illustration
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
