Mastering Software Performance: From Axioms to Capacity Planning
This article explains fundamental performance concepts—defining response time and throughput, using axiomatic methods, analyzing bottlenecks with sequence diagrams and profiling, applying Amdahl’s Law, and guiding capacity planning to build reliable, high‑performance applications.
Thinking Clearly About Performance – This article is a translation of a three‑year‑old essay on performance problems that impressed the author enough to translate it again as his first translation effort.
Whenever the author encounters a performance issue, he recalls this article because it does not focus on specific tools (“the technique” layer) but instead builds a high‑level understanding (“the principle” layer) that can be applied across any technology stack.
Abstract
For developers, technical managers, architects, system analysts and project managers, building high‑performance complex software is extremely difficult. However, by understanding a few basic principles, solving and preventing performance problems becomes simpler and more reliable. This article presents those principles, covering goals, terminology, tools and decisions, and shows how to combine them to create long‑lasting high‑performance applications. Some examples come from Oracle experience, but the scope is not limited to Oracle products.
Table of Contents
Axiomatic Method
What Is Performance?
Response Time vs. Throughput
Percentage Metrics
Problem Diagnosis
Sequence Diagrams
Performance Profiling
Amdahl’s Law
Skewness
Minimizing Risk
Efficiency
Load
Queue Delay
Turning Point
Turning‑Point Correlation
Capacity Planning
Random Arrival
Correlation Delay
Performance Testing
Measurement
Performance as a Feature
Conclusion: Public Debate on Turning Points
References
1. Axiomatic Method
When the author joined Oracle in 1989, performance tuning (often called Oracle tuning) was difficult. Only a few claimed expertise, and many consulted them. The author was unprepared for the field. Later, while tuning MySQL, the experience felt similar to the work done 20 years earlier.
The author compares learning performance tuning to learning algebra at age 13, relying on “mathematical intuition” to solve equations like 3x + 4 = 13. Most people lack that intuition, often resorting to trial‑and‑error.
Trial‑and‑error works for simple equations but is slow and fails when the equation changes slightly. The author did not think deeply about a better method until age 15, when James R. Harkey introduced an axiomatic approach.
Harkey taught the author a step‑by‑step axiomatic method for solving algebraic equations, emphasizing recording both the steps and the thought process. The author’s homework looked like the following:
<code>3.1x + 4 = 13
Subtract equal value
3.1x = 9
Divide by 3.1
x ≈ 2.903
</code>This method consists of a series of logical, provable, auditable small steps and applies to algebra, geometry, trigonometry and calculus.
The author later created a similar rigorous axiomatic method for Oracle performance tuning and eventually extended it to all software performance optimization.
Our goal is to help you think clearly about how to optimize the performance of your software system.
2. What Is Performance?
Searching “Performance” on Google yields billions of results ranging from bike races to employee review processes. In the context of software, performance is “the amount of time a computer program takes to execute a given task.”
A “task” is a business‑oriented unit of work that can be nested. When a user talks about performance, they usually refer to the time the system takes to execute a series of tasks.
Response time is the duration of a task, measured per task, e.g., the time a Google search takes (0.24 s).
Another metric is throughput , the number of tasks completed in a given time interval, e.g., requests per second. Different stakeholders care about different metrics.
3. Response Time vs. Throughput
Generally, response time and throughput are inversely related, but the relationship is not exact. Consider a benchmark that reports 1000 tasks per second; the average response time is not simply 1/1000 s because parallelism and queuing affect the result.
Example 1 illustrates that with 1000 parallel service channels, each request may actually take 1 s, so average response time lies between 0 and 1 s. Therefore, you must measure response time directly.
Example 2 shows a client requiring 100 tps on a single‑core CPU where each task takes 0.001 s. If tasks arrive randomly from many users, contention can prevent achieving the required throughput, demonstrating that response time and throughput must be measured independently.
4. Percentage Metrics
Instead of stating “average response time < r seconds,” it is often better to use percentile‑based statements. Example 3 compares two lists with the same average response time (1 s) but different 90th‑percentile values (0.987 s vs. 1.273 s). The list with a higher 90th‑percentile indicates a larger proportion of dissatisfied users.
Using percentages aligns with customer expectations, e.g., “99.9 % of tracking shipments must complete within 0.5 s.”
5. Problem Diagnosis
Performance problems are often described in terms of response time, e.g., “Task X used to finish in < 1 s but now takes 20 s.” A good diagnosis starts by clearly defining the desired goal and quantifying it, such as “95 % of executions should be under 1 s.”
6. Sequence Diagrams
Sequence diagrams (UML) visualize the order of interactions between objects and are useful for illustrating response time. The article includes several diagrams (omitted here for brevity).
7. Performance Profiling
When many calls are involved, a table‑based performance profile is more practical than a sequence diagram. Example data shows that 70.8 % of response time is spent in
DB:Fetch(), highlighting where optimization effort should focus.
8. Amdahl’s Law
Amdahl’s Law states that the performance gain from speeding up a component depends on how often that component is used. If a component accounts for only 5 % of total response time, the maximum possible improvement is 5 %.
9. Skewness
Skewness measures the non‑uniformity of values. Example 6 shows that halving the number of
DB:fetch()calls does not halve the response time because the distribution of call costs matters.
<code> A = {1, 1, 1, 1}
B = {3.7, .1, .1, .1}
</code>In list B, removing the two longest calls reduces response time dramatically, whereas removing two short calls has little effect.
10. Minimizing Risk
Changing one part of a system can break another. The author shares an anecdote about adjusting Oracle network packet size only for problematic Java applications to avoid global impact.
11. Efficiency
Improving efficiency means reducing wasted work, such as issuing a single prepared statement for bulk inserts instead of thousands of individual statements, or filtering results early to avoid unnecessary buffer accesses.
12. Load
Load is the resource competition caused by concurrent tasks. Higher load increases queue delay and correlation delay, leading to longer response times, similar to traffic congestion.
13. Queue Delay
Queue delay is the time a task waits for a service opportunity. The article presents the M/M/m queuing model, where response time R = Service time S + Queue delay Q.
<code>R = S + Q
</code>14. Turning Point
The turning point is the load level where throughput is maximized while response time degradation remains small. It is the point where the line from the origin is tangent to the response‑time curve.
15. Turning‑Point Correlation
Every resource (CPU, disk, network) has its own turning point, typically lower than theoretical values due to imperfect scalability. Staying below the turning point for random‑arrival workloads prevents severe performance swings.
16. Capacity Planning
Capacity planning uses the turning point to define how much resource capacity is needed to handle peak load without exceeding the turning point.
17. Random Arrival
Random arrival of tasks creates bursts that can exceed the turning point, causing queue delay spikes. Short bursts (e.g., less than 8 seconds) are usually tolerable.
18. Correlation Delay
Correlation delay arises from contention on shared resources (e.g., enqueue, buffer busy waits, latch release). It cannot be modeled by the ideal M/M/m model because service channels are not truly independent.
19. Performance Testing
Testing must balance effort and coverage; insufficient testing leaves hidden problems, while excessive testing is wasteful. A moderate testing level is recommended.
20. Measurement
Throughput is easy to measure; response time is harder. Relying on surrogate metrics (e.g., call counts) can lead to false positives or negatives.
21. Performance as a Feature
Performance is a functional feature that must be designed and built, not an afterthought. Measuring performance in production is essential for ongoing improvement.
Conclusion: Public Debate on Turning Points
The article recounts a 20‑year‑old debate about the usefulness of defining a turning point, citing differing opinions from Stephen Samson, Neil Gunther, and others.
References
CMG (Computer Measurement Group)…
Eight‑second rule…
Garvin, D. 1993…
General Electric Company…
Gunther, N. 1993…
Knuth, D. 1974…
Kyte, T. 2009…
Millsap, C. 2009…
Millsap, C. 2009…
Millsap, C. 2009…
Millsap, C., Holt, J. 2003…
Oak Table Network…
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.