Why a Missing Commit Crashed Our Payment System – A Deep Dive into Spring Transaction Pitfalls

A payment service showed successful orders but no data in the database, leading to lock timeouts; the root cause was a hidden missing commit in a special code path that polluted the connection pool, causing other services to inherit an uncommitted transaction and fail intermittently.

Top Architect
Top Architect
Top Architect
Why a Missing Commit Crashed Our Payment System – A Deep Dive into Spring Transaction Pitfalls

Incident Overview

Last week the online payment system started failing: users completed payments, yet the order table remained empty and occasional lock‑timeout errors appeared. The DBA noticed several transactions that never committed, locking rows in the order table.

Symptoms

Business code executed without errors and logs showed normal operation.

Log entries indicated commit was called, but no data was persisted.

Most of the time the operation failed, but occasionally it succeeded.

Emergency Fix

The immediate response was to restart the application, forcing all connections to be released. After the restart the payment flow worked again, but the underlying bug still needed to be identified.

Root Cause Discovery

Reviewing the recent deployment revealed a new business method that forgot to call commit in a special branch:

@Service
public class SomeService {
    public void handleSpecialCase() {
        // start transaction
        sqlSession.connection.setAutoCommit(false);
        // execute SQL
        mapper.insert(data);
        // special case: return without commit!
        if (specialCondition) {
            return; // commit missed here
        }
        sqlSession.commit();
    }
}

The return caused the transaction to end without a commit or rollback, leaving the ConnectionHolder marked as an active transaction.

Quick Fix

Adding an explicit commit before the early return resolved the issue:

@Service
public class SomeService {
    public void handleSpecialCase() {
        TransactionStatus status = transactionManager.getTransaction(new DefaultTransactionDefinition());
        try {
            mapper.insert(data);
            if (specialCondition) {
                transactionManager.commit(status); // ensure commit
                return;
            }
            transactionManager.commit(status);
        } catch (Exception e) {
            transactionManager.rollback(status);
            throw e;
        }
    }
}

Why Did It Affect Unrelated Services?

After the fix, a deeper investigation showed that Spring’s doGetTransaction reuses the same physical connection from the TransactionSynchronizationManager. When the previous method returned without committing, the ConnectionHolder still had isTransactionActive() = true. The next request (e.g., PaymentService.createOrder) fetched this polluted connection, and isExistingTransaction reported an existing transaction. Consequently, the new service joined the stale transaction instead of starting a fresh one.

The relevant code paths are:

protected Object doGetTransaction() {
    DataSourceTransactionObject txObject = new DataSourceTransactionObject();
    txObject.setSavepointAllowed(this.isNestedTransactionAllowed());
    ConnectionHolder conHolder = (ConnectionHolder) TransactionSynchronizationManager.getResource(this.obtainDataSource());
    txObject.setConnectionHolder(conHolder, false);
    return txObject;
}

protected boolean isExistingTransaction(Object transaction) {
    DataSourceTransactionObject txObject = (DataSourceTransactionObject) transaction;
    return txObject.hasConnectionHolder() && txObject.getConnectionHolder().isTransactionActive();
}

Because the connection was still marked as active, the subsequent service treated the request as part of an existing transaction. When it later called processCommit, the framework checked status.isNewTransaction(), which was false, so the actual connection.commit() never executed. The uncommitted data remained in the connection pool, contaminating further requests.

Why Was It Intermittent?

TransactionSynchronizationManager

is thread‑local. If a request was handled by a clean thread (no polluted ConnectionHolder), the transaction succeeded. If it ran on a thread that inherited the polluted connection, the bug manifested. This explains the non‑deterministic behavior.

Prevention Measures

1. Connection‑Pool Health Checks

spring:
  datasource:
    hikari:
      connection-test-query: SELECT 1
      validation-timeout: 3000
      connection-init-sql: SET autocommit=1

These settings ensure each borrowed connection is validated and reset, preventing polluted connections from being reused.

2. Monitoring & Alerts

Add SQL to detect long‑running transactions and set up alerts:

SELECT * FROM information_schema.innodb_trx
WHERE TIME_TO_SEC(TIMEDIFF(NOW(), trx_started)) > 30;

3. Explicit Transaction Management

Always place commit at the end of the try block.

Put rollback in the catch block.

Release resources in a finally block.

4. Database‑Level Monitoring

Track slow queries, long transactions, lock waits, and connection counts to catch issues that application logs may miss.

5. Debug Source Code When Issues Appear

Set breakpoints in getTransaction and isExistingTransaction to see how connections are reused. Relying solely on documentation can hide subtle bugs.

Takeaways

Connection pools are not just performance optimizations; they can propagate transaction state bugs.

Manual transaction code must clearly handle commit and rollback.

Monitoring must cover both application and database layers.

Debugging with real breakpoints often reveals problems that static code reading cannot.

By fixing the missing commit, adding pool validation, and improving monitoring, the team eliminated the intermittent payment failures and strengthened the overall reliability of the system.

Issue diagram
Issue diagram
backendJavatransactiondatabaseSpring
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.