System Performance Issue Analysis and Optimization Process
This article outlines a comprehensive process for diagnosing and optimizing performance problems in production business systems, covering hardware, OS, database, middleware, JVM tuning, code inefficiencies, monitoring tools, and the limitations of pre‑release testing, with practical guidelines and visual references.
System Performance Issue Analysis Process
When a business system that performed well before launch suddenly experiences serious performance degradation after going live, the root causes usually fall into three categories.
High concurrent access leading to bottlenecks
Growing database volume causing slowdown
Changes in critical environment factors such as network bandwidth
First, determine whether the problem occurs under a single‑user (non‑concurrent) scenario or only under load. Single‑user issues often stem from code or SQL inefficiencies, while concurrent issues require analysis of the database and middleware.
During load testing, monitor CPU, memory, and JVM to detect problems like memory leaks that may also be caused by the application code.
Performance Issue Influencing Factors
Performance factors can be grouped into three main areas: hardware environment, software runtime environment, and the software program itself.
Hardware Environment
Includes compute, storage, and network resources. CPU performance is often expressed by TPMC, but real‑world X86 servers may underperform compared to mainframes with the same TPMC. Storage I/O performance is a frequent bottleneck; high CPU/memory usage may actually be caused by slow disk I/O.
Linux provides tools such as iostat , ps , sar , top , and vmstat for monitoring CPU, memory, JVM, and disk I/O.
Runtime Environment – Database and Application Middleware
Database and middleware performance tuning are common sources of issues.
Database Performance Tuning
For Oracle databases, performance is affected by system, database, and network factors. Optimization includes improving disk I/O, rollback segments, redo logs, system global area, and database objects.
In init.ora set TIMED_STATISTICS=TRUE and in the session execute ALTER SESSION SET STATISTICS=TRUE . Run svrmgrl with connect internal , then during normal activity execute utlbstat.sql to start statistics and utlestat.sql to stop; results are written to report.txt .
Database performance monitoring is an ongoing task; DBAs regularly extract high‑cost SQL statements for developers to review and watch KPI alerts such as excessive redo generation.
Application Middleware Performance Analysis and Tuning
Middleware containers (WebLogic, Tomcat, etc.) require configuration parameter optimization and JVM tuning.
Key JVM parameters include:
-Xmx – maximum heap size
-Xms – initial heap size
-XX:MaxNewSize – maximum young generation size
-XX:NewSize – initial young generation size
-XX:MaxPermSize – maximum permanent generation (now Metaspace)
-XX:PermSize – initial permanent generation (now Metaspace)
-Xss – thread stack size
Recommended sizing: set Xmx / Xms to 3–4 times the live old‑generation memory after a Full GC; Metaspace to 1.2–1.5 times; young generation ( Xmn ) to 1–1.5 times; old generation to 2–3 times the live objects.
Note: newer JVMs have replaced PermSize with Metaspace, so heap‑to‑Metaspace ratios and garbage‑collector choice must be considered.
Software Program Performance Issue Analysis
Often the first instinct is to add hardware resources, but many performance problems are caused by code defects such as inefficient loops, unreleased resources, lack of caching, long‑running transactions, or sub‑optimal data structures and algorithms.
These issues are best discovered through static code analysis, code reviews, and establishing coding standards.
Business System Performance Issue Expansion Thoughts
Beyond the standard analysis flow, consider whether pre‑release performance testing truly reflects production conditions. Factors such as hardware fidelity, realistic data volume, and genuine concurrency are hard to replicate.
Horizontal scaling (clusters) can mitigate concurrency but does not solve inherent single‑node performance flaws.
Is Pre‑Release Performance Testing Useful?
Challenges include:
Can the test hardware fully emulate production?
Can the data volume reflect real‑world accumulation?
Can concurrency be simulated accurately with recorded scenarios and multiple load generators?
Because these are difficult, many issues surface only after go‑live.
Business System Performance Diagnosis Classification
Static classification can be divided into:
Operating system and storage layer
Middleware layer (databases, application servers)
Software layer (SQL, business logic, front‑end)
Dynamic analysis follows the request path to pinpoint the exact component (SQL, code, or infrastructure) causing slowdown.
Detecting Performance Problems
Two main detection paths:
Monitoring tools and APM alerts
User feedback during operation
APM (Application Performance Management) monitors critical business applications, improves reliability, reduces total cost of ownership, and links resources → applications → business functions.
Traditional monitoring often shows only resource saturation, making it hard to identify the offending service or SQL. Modern APM combined with service‑chain tracing can quickly locate the problematic call or query.
With DevOps and automated operations, proactive APM monitoring enables full‑stack performance analysis, dramatically improving diagnosis efficiency.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.