How to Diagnose and Optimize Business System Performance After Launch
This article outlines a comprehensive process for analyzing, diagnosing, and optimizing performance issues in production business systems, covering hardware, OS, database, middleware, JVM settings, code inefficiencies, and the role of monitoring tools like APM to pinpoint bottlenecks.
| System Performance Analysis Process
When a business system shows serious performance problems after going live, the root causes usually fall into three categories: high concurrent access causing bottlenecks, growing data volume in the database, and changes in critical environment factors such as network bandwidth.
First, determine whether the issue appears under single‑user (non‑concurrent) conditions or only under load. Single‑user problems often stem from code or SQL inefficiencies, while concurrent problems require pressure testing to identify resource contention.
During load testing, monitor CPU, memory, and JVM to detect issues like memory leaks that can also cause performance degradation.
| Factors Influencing Performance Issues
Performance is affected by three main layers: hardware environment, software runtime environment, and the application code itself.
Hardware Environment
Includes compute, storage, and network resources. Server CPU capability is often expressed by TPMC, but real‑world performance can vary. Storage I/O performance is a common bottleneck; high CPU and memory usage may mask underlying I/O limits.
Linux provides tools such as iostat, ps, sar, top, and vmstat for monitoring CPU, memory, JVM, and disk I/O.
| Runtime Environment – Database and Application Middleware
Database and middleware tuning are frequent sources of performance problems.
Database Tuning
For Oracle, performance factors include system, database, and network. Optimization targets include disk I/O, rollback segments, redo logs, SGA, and database objects. Continuous monitoring and analysis of high‑memory alerts, excessive redo generation, and inefficient SQL are essential.
Application Middleware Tuning
Middleware containers such as WebLogic or Tomcat require configuration tuning (JVM parameters, thread pools, connection pool sizes) and, in clustered setups, cluster‑specific settings.
Key JVM parameters:
-Xmx # maximum heap size
-Xms # minimum heap size
-XX:MaxNewSize # maximum young generation size
-XX:NewSize # minimum young generation size
-XX:MaxPermSize # maximum permanent generation (now Metaspace)
-XX:PermSize # minimum permanent generation (now Metaspace)
-Xss # thread stack sizeRecommended sizing: set -Xmx and -Xms to 3‑4 times the expected old‑generation usage after a Full GC; Metaspace should be 1.2‑1.5 times the old‑generation usage; young generation ( -Xmn) 1‑1.5 times; old generation 2‑3 times the surviving objects.
In newer JVM memory models, PermSize is replaced by Metaspace, so heap and Metaspace ratios and the chosen garbage collector must be considered.
| Business System Performance Expansion Considerations
Beyond the standard analysis flow, consider whether pre‑deployment performance testing truly reflects production conditions. Simulating real hardware, data volume, and concurrency is difficult, leading to post‑launch surprises.
Horizontal scaling (clusters) can alleviate some bottlenecks, but database scaling is often limited (e.g., Oracle RAC provides only 2‑3× performance). Application clusters scale more readily, yet performance issues may still arise from accumulated data or single‑node inefficiencies.
| Classification of Performance Diagnosis
Operating system and storage layer
Middleware layer (databases, application servers)
Software layer (SQL, business logic, front‑end)
Dynamic analysis traces a request through code and infrastructure to pinpoint slow components, such as inefficient SQL or front‑end rendering delays.
| Common Software Code Performance Pitfalls
Creating large objects or opening DB connections inside loops
Memory leaks due to unreleased resources
Missing caching where appropriate
Long‑running transactions consuming resources
Choosing sub‑optimal data structures or algorithms for a given scenario
Code reviews and static analysis are essential to uncover these issues.
| Detecting Performance Issues via Monitoring and APM
Performance problems can be discovered through IT resource monitoring/APM alerts or user feedback. APM tracks key business applications, improves reliability, and reduces total cost of ownership.
Traditional monitoring focuses on CPU, memory, and network metrics, but linking these to specific services, processes, or SQL statements requires deeper analysis. Modern APM provides end‑to‑end tracing, allowing rapid identification of slow services or queries.
Integrating resource → application → business‑function analysis enables proactive detection and faster resolution of performance bottlenecks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
