Backend Development 21 min read

Boosting a Payment System from 40TPS to 60TPS: Real-World Backend Performance Hacks

This article walks through a real‑world payment service’s performance evolution, detailing the server environment, a dozen common bottlenecks such as database deadlocks, long‑running transactions, CPU saturation, thread‑pool misuse, logging overload, cache issues, and provides concrete code‑level optimizations, architectural changes, and monitoring tips that raised throughput and stability.

dbaplus Community

Jun 15, 2016

Boosting a Payment System from 40TPS to 60TPS: Real-World Backend Performance Hacks

Introduction

In this post the author shares the performance evolution of a payment‑processing project he is responsible for, focusing on code‑level optimizations rather than high‑level architecture.

Server Environment

Four servers, each with 4‑core CPU and 8 GB RAM. The stack includes RabbitMQ, DB2, an internal Dubbo‑based SOA framework, Redis and Memcached for caching, and a custom configuration‑management system.

Problem Description

Single‑node capacity of 40 TPS; adding three more nodes only reaches 60 TPS, indicating poor scalability.

Frequent database deadlocks causing complete service outage.

Improper use of database transactions leading to excessively long lock times.

Regular memory‑overflow and CPU‑saturation incidents in production.

Poor fault tolerance; a tiny bug can bring the whole service down.

Missing or useless log statements that provide no diagnostic value.

Frequent reads of rarely‑changed configuration data from the database, generating heavy I/O.

Multiple WAR packages deployed in a single Tomcat, causing resource contention.

Underlying platform bugs or feature gaps reducing service availability.

No rate‑limiting on APIs, allowing VIP merchants to stress‑test the production environment.

No degradation strategy; issues lead to long recovery times or blunt rollbacks.

Lack of proper monitoring, preventing real‑time detection of bottlenecks.

Optimization Solutions

1. Database Deadlock Mitigation

The deadlock example shows two sessions waiting on each other because of mixed FOR UPDATE, gap lock, and next‑key lock usage.

Root cause: excessive pessimistic locking for idempotency checks.

Use Redis distributed locks with sharding; a single node failure is tolerable.

Implement idempotency via a primary‑key check table that returns a duplicate‑key error on repeat inserts.

Adopt version‑number based optimistic locking.

All three approaches require an expiration time to release stale locks.

2. Reducing Transaction Duration

Long‑running transactions often mix HTTP client calls or other blocking I/O inside the transaction scope.

Guideline: keep transactions short—extract HTTP calls out of the transactional block.

3. CPU Saturation Analysis

During load testing, CPU usage remained high. Investigation revealed that the default C3P0 connection pool and an unbounded thread pool created thousands of threads, exhausting resources.

Fixes:

Replace C3P0 with a more scalable pool.

Limit thread creation using Executors.newFixedThreadPool(50) and avoid unbounded queues.

Final thread‑pool design options are shown in the following diagrams:

Because the servers have only 2–4 CPU cores, excessive threads degrade performance. The solution moves asynchronous tasks to a dedicated task processor, with a retry mechanism via a task table.

4. Logging Improvements

Current logging mixes logger.error and logger.warn with noisy, low‑value messages, causing disk I/O pressure and thread blocking.

Recommended format:

[System] Error description [KeyInfo] – include cause and effect, and optionally input/output parameters.

After reconfiguring Log4j1.2.14, thread‑blocking due to logging dropped dramatically, as shown by the before/after charts.

5. Cache Optimization

Three typical cache problems are identified:

Cache penetration – queries for non‑existent keys repeatedly hit the DB.

Cache concurrency – many threads query DB simultaneously when a cache entry expires.

Cache avalanche – many keys expire at the same moment, flooding the DB.

Solutions:

Store a placeholder (e.g., "&&") for missing keys to prevent DB hits.

Apply a lock around cache‑miss handling so only one thread populates the cache.

Randomize cache TTL (add 1‑5 minutes) to spread expirations.

6. Fault‑Tolerance Enhancements

Illustrates that swallowing DAO exceptions in the service layer does not constitute fault tolerance.

Proposes a hybrid cache strategy: critical data (e.g., payment limits) are always fetched from Redis, with a local fallback cache for resilience; less‑critical data can rely on asynchronous sync via MQ or Zookeeper.

7. Incomplete Project Splitting

Deploying multiple WARs in a single Tomcat creates resource contention; the fix is to isolate each WAR in its own Tomcat instance.

8. Platform Component Limitations

Using Future to implement timeouts around a Dubbo call indicates that the underlying Dubbo timeout was ineffective, adding overhead.

9. Quick Bottleneck定位

Combining top to find high‑CPU processes and pstack to inspect thread stacks quickly isolates slow threads. Example output shows thread LWP 30222 consuming 31.4 ms.

10. Index Optimization Tips

Follow the left‑most principle for composite indexes.

Avoid excessive indexes; distinguish between clustered and secondary indexes.

Indexes do not include columns containing NULL values.

MySQL uses only one index per query; avoid ORDER BY on non‑indexed columns.

Applicable operators: >=, BETWEEN, IN, LIKE (without leading %).

Non‑applicable operators: NOT IN, LIKE with leading %.

Prefer numeric columns over strings for indexing to save space and improve I/O.

11. Redis Usage Recommendations

Set expiration times for keys to prevent memory exhaustion.

Keep key names short and values simple; store objects as JSON or Protobuf.

Always return connections to the pool after use.

Conclusion

The systematic analysis and targeted code‑level fixes—ranging from database locking strategies and transaction scoping to thread‑pool sizing, logging hygiene, cache design, and monitoring—raised the service’s throughput, reduced latency, and improved overall stability. The author hints that the next article will cover degradation, rate‑limiting, and monitoring solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Performance logging thread-pool

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.