Memory Allocation vs. Pooling: Performance Analysis Across Java, G1, ZGC, and C++
This article investigates whether to allocate new memory for each incoming message or to reuse memory from a pool, analyzing the impact on throughput and latency in batch and soft‑real‑time applications, and presenting extensive benchmark results for various garbage collectors, JVM options, and native C++ implementations.
Memory Allocation vs. Pooling as a Measurement Tool
Today’s question is whether to allocate new memory for each new message or to use a memory pool. In online discussions, traditional C programmers usually avoid allocation, while Java programmers tend to allocate new memory. This article analyzes both approaches in detail.
About Real‑Time Programs
Some readers may wonder why anyone tries to write real‑time programs in Java. Everyone knows Java is not a real‑time platform, and ordinary Windows or Linux are not real‑time operating systems. No one writes true real‑time programs (e.g., autopilots) in Java. In this article, a "real‑time program" refers to a near‑real‑time program (soft real‑time) that tolerates a small amount of event loss, such as a network traffic analyzer that can drop a few hundred packets out of a million without serious impact. Such programs can be written in any language, including Java, and run on conventional OSes. We will use a very simplified model of such an analyzer as the example program.
Impact of GC
Why is the choice between allocation and pooling important? For Java, the most critical factor is the garbage collector (GC), which can pause the entire program ("stop‑the‑world").
The simplest form of a garbage collector:
Invoked when a memory allocation request fails, occurring at a rate proportional to the allocation rate.
Runtime proportional to the number of live objects.
Real collectors use many techniques to improve performance, reduce long pauses, and lessen sensitivity to the number of live objects, such as generational spaces, young‑generation collectors, and concurrent execution. These optimizations often require generated code (write barriers, read barriers) that can slow execution but reduce GC pause times.
The two basic GC rules remain: allocating more memory increases GC frequency, and having more live objects lengthens GC runtime.
Allocating new memory usually keeps the number of live objects low but causes frequent GC; pooling reduces allocations but keeps many buffer objects alive, leading to fewer GC cycles but longer pauses.
Other Issues
Separate allocation also incurs costs such as zero‑initialization and constructor calls. Pooling introduces overhead for tracking buffer usage, especially in multithreaded scenarios, and can lead to buffer leaks similar to classic C memory leaks.
Mixed Version
A common approach is to keep a certain pool capacity and allocate new buffers when demand exceeds it. If a released buffer is not empty, it is returned to the pool; otherwise it is discarded. This provides a good trade‑off between pooling and fresh allocation and is worth testing.
Testing
We simulate a network analyzer that captures packets, decodes protocols, and gathers statistics. The simplified model includes:
Packet class consisting of a byte buffer and its parse result (using DirectByteBuffer instead of a byte array to increase object allocation).
Data source that obtains buffers and fills random IP‑related information; it does not allocate memory besides the buffers.
Queue (FIFO) of size INTERNAL_QUEUE_SIZE (implemented with ArrayDeque ) that stores active buffers.
Handler that parses buffers, allocates temporary objects, and stores some results (e.g., TCP packets) in a structure of size STORED_COUNT .
We first study the single‑threaded case where the handler and data source run in the same thread; multithreaded scenarios will be considered later. The pool sizes used are MIX_POOL_SIZE for mixed mode and POOL_SIZE for full pooling.
Batch Strategies
To measure the cost of the test framework, we introduce a dummy strategy that uses a single packet everywhere.
# java -Xloggc:gclog -Xms2g -Xmx2g -server Main alloc batchTest results (nanoseconds) for Dummy, Allocation, Mix, and Pooling strategies across scenarios A (low load), B (moderate load), and C (high load) are shown in the following table:
Strategy
A
B
C
Dummy
59
57
66
Allocation
400
685
4042
Mix
108
315
466
Pooling
346
470
415
Allocation is the worst strategy, especially in scenario C. Pooling performs best in batch mode, while Mix is best in scenarios A and B.
GC Analysis
GC statistics (max pause, average pause, GC count per second, GC fraction, object count, GC time per object) for each strategy and scenario are presented in detailed tables. The data show that allocation leads to high GC frequency and long pauses, while pooling reduces GC frequency but can cause long pauses in scenario A due to many live buffers.
Real‑Time Tests
Real‑time tests with source interval 1000 ns show that the allocation strategy loses packets quickly as the interval decreases, while pooling cannot handle even scenario A without packet loss. Increasing heap size reduces loss but is impractical.
G1 Garbage Collector
Switching from CMS to G1 (with -XX:+UseG1GC -XX:MaxGCPauseMillis=80 ) yields mixed results: in most cases performance degrades, but in scenarios A and B pooling becomes faster. GC logs become more complex because not every line represents a pause.
ZGC
Running on Java 11 with ZGC ( -XX:+UnlockExperimentalVMOptions -XX:+UseZGC ) shows overall improvement, especially for allocation, though G1 still outperforms ZGC in many batch cases.
Native Buffers with CMS
Moving buffers off‑heap using DirectByteBuffer reduces heap pressure. Tests with reduced heap size (1 GB) show much better results for both allocation and pooling, especially in real‑time mode where loss percentages become negligible.
Native Buffers with G1
Batch results with native buffers and G1 are slightly worse than CMS but still better than pure Java heap allocations.
Native Buffers with ZGC
ZGC combined with native buffers yields significant improvements across all scenarios.
Trying C++
Implementing the same benchmark in C++ removes GC entirely. However, memory allocation in C++ is still expensive; the pool‑free version performs best, while allocation remains slower than Java’s best configurations.
C++ Without Allocation (Flat Version)
By reserving space for the most common objects inside the buffer (zero allocation), the flat C++ version achieves the best performance across all scenarios, processing up to 4 M packets/s in scenario C.
Summary Table
All results are consolidated in a large table that highlights the best (green) and second‑best (yellow) outcomes. The flat C++ version consistently yields the best numbers, while the best Java results come from the native CMS configuration, with ZGC being a strong contender.
Conclusion
For true real‑time systems, use C or C++ and avoid allocation. Java can achieve near‑real‑time performance with careful reduction of allocations and the use of modern collectors (G1, ZGC). Pooling consistently outperforms fresh allocation in every test, especially under high load. Increasing input queue size or using off‑heap buffers can further mitigate GC pauses.
Original article: https://pzemtsov.github.io/2019/01/17/allocate-or-pool.html
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.