Why a Missing Commit Crashed Our Payment System – Deep Dive into Spring Transaction Bugs
An online payment service failed when a newly added business logic forgot to commit its transaction, leaving a polluted connection that silently blocked subsequent orders, and the post‑mortem explains the root cause, debugging steps, code fixes, and preventive measures.
Accident Scene
Last week the online payment service broke: payments succeeded but no rows appeared in the order table, and occasional lock‑timeout errors were observed when modifying orders.
DBA noticed several transactions remained uncommitted, locking rows in the order table.
Emergency Handling
The quickest fix was to restart the application, which forced the connections to close and temporarily restored the payment function.
After the restart the team identified a newly deployed business interface as the culprit and examined its code.
@Service
public class SomeService {
public void handleSpecialCase() {
// open transaction
sqlSession.connection.setAutoCommit(false);
// execute SQL
mapper.insert(data);
// special branch – forgot commit!
if (specialCondition) {
return; // missing commit
}
sqlSession.commit();
}
}The special branch returns before the commit call, leaving the transaction open.
Post‑mortem
Even after fixing the bug, the team wondered why a single uncommitted transaction could affect unrelated payment logic.
Root Cause Analysis
Debugging the Spring transaction manager revealed that getTransaction reuses connections from TransactionSynchronizationManager. The method doGetTransaction obtains a ConnectionHolder from the manager, which may already hold an active transaction flag.
protected Object doGetTransaction() {
DataSourceTransactionObject txObject = new DataSourceTransactionObject();
txObject.setSavepointAllowed(this.isNestedTransactionAllowed());
// key: get connection from TransactionSynchronizationManager
ConnectionHolder conHolder = (ConnectionHolder) TransactionSynchronizationManager.getResource(this.obtainDataSource());
txObject.setConnectionHolder(conHolder, false);
return txObject;
}If a previous business flow returned without committing, the ConnectionHolder still has isTransactionActive() == true. The next business call sees this as an existing transaction ( isExistingTransaction returns true) and joins it instead of starting a new one.
protected boolean isExistingTransaction(Object transaction) {
DataSourceTransactionObject txObject = (DataSourceTransactionObject) transaction;
return txObject.hasConnectionHolder() && txObject.getConnectionHolder().isTransactionActive();
}Consequently, the later PaymentService.createOrder obtains the polluted connection, thinks it is part of an existing transaction, and when processCommit runs it skips the actual connection.commit() because status.isNewTransaction() is false.
private void processCommit(DefaultTransactionStatus status) throws TransactionException {
try {
// ...
if (status.isNewTransaction()) {
// real DB commit
doCommit(status);
}
// else: nothing happens
} finally {
cleanupAfterCompletion(status);
}
}The result: data never reaches the database, and the stale ConnectionHolder stays in the manager, contaminating subsequent requests.
Why It Occasionally Succeeds
TransactionSynchronizationManageris thread‑local. If a request is handled by a clean thread with a fresh connection, the transaction proceeds normally. If it runs on a thread that inherited the polluted connection, the bug manifests. This explains the non‑deterministic failures.
Preventive Measures
Configure the connection pool to validate and reset connections (e.g., connection-test-query: SELECT 1, connection-init-sql: SET autocommit=1).
Monitor long‑running transactions in the database (e.g., query information_schema.innodb_trx for transactions longer than 30 seconds).
Set up alerts for long transactions, lock waits, and abnormal connection counts.
Lessons Learned
Connection pools are not just performance tools; they can propagate transaction state bugs.
When managing transactions manually, always place commit at the end of the try block and rollback in the catch block, with resource cleanup in finally.
Database‑level monitoring is essential because application logs may appear normal while the DB is stuck.
Debugging complex issues often requires stepping through the framework code (e.g., setting breakpoints in getTransaction and doGetTransaction) rather than only reading documentation.
Conclusion
The incident showed how a single missing commit can cascade through a connection pool, causing silent data loss in unrelated services. By fixing the code, tightening connection‑pool hygiene, and adding proper monitoring, the team prevented recurrence.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
