Operations 24 min read

How to Boost Banking System Performance: A Proven Testing & Optimization Playbook

This article presents a comprehensive methodology for performance testing and optimization of a new corporate banking system, covering evaluation metrics, monitoring tools, test environment setup, common bottlenecks, detailed troubleshooting steps, and real‑world case studies to achieve higher throughput, lower latency, and better scalability.

dbaplus Community

Jan 2, 2019

How to Boost Banking System Performance: A Proven Testing & Optimization Playbook

The XX Bank corporate online banking system combines multiple legacy channels, resulting in a much higher concurrent load and throughput requirement. To ensure fast response times for corporate users, the project team devised a systematic performance testing and optimization methodology.

1. Application System Performance Evaluation Indicators

Response Time: Speed at which the system returns a response to the user.

Throughput (TPS): Number of transactions processed per second.

Concurrency: Ability of the system to remain stable under high concurrent request load.

Scalability: Ability to horizontally scale when a single machine cannot handle the load.

TPS = Concurrent Users / Response Time

2. Common Performance Monitoring Metrics and Tools

Operating System Metrics top -H -p pid – real‑time CPU load per thread. vmstat – overall system load. pidstat – context‑switch and lock contention monitoring. iostat – disk I/O utilization. nmon -f -t -s 2 -c 100 – collect CPU, memory, I/O every 2 seconds, 100 samples. netstat -anp | grep ESTABLISHED | wc -l – count established connections.

JVM Metrics

JConsole – CPU and garbage‑collection monitoring.

JVM start‑up parameters:

-Dcom.sun.management.jmxremote.port=1088
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

jvisualvm – CPU sampling, method‑level analysis, thread snapshots.

jca457.jar – Javacore thread‑snapshot analysis.

ga456.jar, JProfiler, OracleDeveloperStudio – CPU sampling and method‑level profiling.

3. Performance Test Preconditions

Accurate DB data volume: Test data should match production size (at least one order of magnitude).

Test environment identical to production: Same hardware, OS, middleware versions (IHS 8.5.5.9, WAS 8.5, IBM JDK 1.7, DB2 V10, Redis 3.2.3).

Reasonable concurrent user estimate:

C = nL / T   // n = login count, L = average online time, T = observation period
Peak = C + 3*sqrt(C)

or based on 1 million users with 80% active during peak hours: 1000000*0.8/4/3600*1.5 = 82.5 users Prioritize test functions: Rank business transactions by volume and test the most critical first.

Define performance problem thresholds: e.g., response time > 3 s, TPS < 10, CPU > 70 %, JVM heap 100 % used, frequent GC, I/O bottlenecks.

4. General Performance Optimization Approach

Identify bottlenecks: Locate modules causing low TPS, high latency, or excessive resource usage.

Typical bottleneck causes: Slow DB queries, excessive logging, large XML parsing, complex business logic, lock contention.

Optimization principles:

Fix the bottleneck first.

Keep solutions simple, avoid adding new complexity or degrading user experience.

Ensure the system meets performance goals without introducing new bugs.

5. Common Issues and Optimization Methods

5.1 SQL Execution Time Too Long

Symptoms: Slow response, high DB CPU.

Causes: Full table scans, ineffective indexes, sort overflow.

Solutions:

Use DB2 snapshots to find slow SQL, add appropriate indexes so cost < 100.

Implement table‑clean‑up policies, periodic REORG and RUNSTATS.

Investigation: db2 get snapshot for all on corpdb, db2expln -d corpdb -t -g -q "SQL".

5.2 Database Deadlock

Symptoms: Deadlock detected in DB2 snapshots or db2evmon -db corpdb -evm DB2DETAILDEADLOCK logs.

Causes: Full table scans, large transactions, interleaved lock acquisition.

Solutions: Reduce transaction size, ensure consistent lock order, add indexes on lock‑contended columns.

Investigation: Analyze deadlock log dlock.txt to locate offending SQL and locks.

5.3 Log‑Blocking Threads

Symptoms: Threads blocked on log4j1.x output, high response time.

Solutions: Reduce unnecessary logging, upgrade to log4j2 with proper buffering, avoid asynchronous logging for very large messages.

5.4 Multi‑Thread Concurrency Issues

Symptoms: Logic errors or transaction failures under load due to shared mutable state.

Solutions: Convert global variables to method‑local, add synchronized blocks or java.util.concurrent locks.

5.5 Too Many Open Files

Symptoms: "Too many open files" errors during load test.

Solutions: Ensure file streams are closed, increase nofile limits in /etc/security/limits.conf.

5.6 Memory Leak

Symptoms: JVM OutOfMemoryError.

Solutions: Enable -XX:+HeapDumpOnOutOfMemoryError, analyze heap dump with ha456.jar, identify and limit growth of global collections.

5.7 Frequent GC

Symptoms: High GC CPU usage, -verbose:gc -Xverbosegclog:/usr/ebank/logs/gcdetail.log shows frequent collections.

Solution: Increase heap size, e.g., -Xmx2048m -Xms2048m.

5.8 High CPU

Symptoms: CPU > 90 % during 50‑user load.

Solutions: Optimize algorithms, replace XML with JSON, use Map instead of List for lookups, profile with jvisualvm, tprofiler, JProfiler.

5.9 Batch Insert Slow

Symptoms: Large data loads (> 100 k rows) take excessive time.

Solutions: Use JDBC batchUpdate with commit every 1000 rows, employ thread pools, add buffering.

5.10 Database CPU High

Symptoms: Backend command‑sending threads cause DB CPU spikes.

Solutions: Remove unnecessary ORDER BY, redesign command‑sending thread pool (producer‑consumer model).

5.11 TPS Curve Instability

Symptoms: Sudden TPS drop or jitter during load test.

Causes: GC pauses, disk‑full logs.

Investigation: Monitor CPU with top, capture javacore ( kill -3 pid) to analyze thread stacks.

6. Optimization Cases

6.1 Banking System Overview

Architecture: 1 Web, 1 PRE, 1 BP, 1 DB2, 1 Redis.

Request chain: Client → Web → PRE → BP → DB2 (or mock backend).

Characteristics: many permission checks, fine‑grained APIs (15‑19 per transaction), strong consistency, high availability.

6.2 Database Message‑Queue (Instruction Sending) Case

Before Optimization: Single backend task polls the instruction table, leading to insufficient throughput or high DB CPU when multiple tasks run.

After Optimization: Each backend server runs one producer and 20 consumer tasks. Producer fetches 100 instructions at a time, distributes them to idle consumers; if none are idle, it waits 15 s before retrying. This reduces DB CPU load and improves throughput.

6.3 Database Deadlock Case

Phenomenon: Multiple producer threads updating the instruction table cause deadlocks.

Root Cause: Two transactions lock different rows and wait for each other's locks (full table scans, large transactions).

Resolution Steps:

Analyze DB snapshots and deadlock logs to locate offending SQL.

Test with and without indexes; adding an index on PRTY eliminated deadlocks.

Use RS isolation level to allow non‑conflicting rows to be processed.

Adjust transaction size and lock strategy accordingly.

7. Future Performance Improvement Options

Increase use of distributed cache for read‑heavy data.

Trim BP logs by removing transaction access‑record tables.

Consolidate and cache front‑end interfaces.

These measures aim to further reduce database pressure and improve overall system responsiveness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JVM database System optimization Performance Testing loadrunner

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.