Fundamentals 31 min read

Memory Allocation vs. Pooling: Performance Analysis Across Java, G1, ZGC, and C++

This article investigates whether to allocate new memory for each incoming message or to reuse memory from a pool, analyzing the impact on throughput and latency in batch and soft‑real‑time applications, and presenting extensive benchmark results for various garbage collectors, JVM options, and native C++ implementations.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Memory Allocation vs. Pooling: Performance Analysis Across Java, G1, ZGC, and C++

Memory Allocation vs. Pooling as a Measurement Tool

Today’s question is whether to allocate new memory for each new message or to use a memory pool. In online discussions, traditional C programmers usually avoid allocation, while Java programmers tend to allocate new memory. This article analyzes both approaches in detail.

About Real‑Time Programs

Some readers may wonder why anyone tries to write real‑time programs in Java. Everyone knows Java is not a real‑time platform, and ordinary Windows or Linux are not real‑time operating systems. No one writes true real‑time programs (e.g., autopilots) in Java. In this article, a "real‑time program" refers to a near‑real‑time program (soft real‑time) that tolerates a small amount of event loss, such as a network traffic analyzer that can drop a few hundred packets out of a million without serious impact. Such programs can be written in any language, including Java, and run on conventional OSes. We will use a very simplified model of such an analyzer as the example program.

Impact of GC

Why is the choice between allocation and pooling important? For Java, the most critical factor is the garbage collector (GC), which can pause the entire program ("stop‑the‑world").

The simplest form of a garbage collector:

Invoked when a memory allocation request fails, occurring at a rate proportional to the allocation rate.

Runtime proportional to the number of live objects.

Real collectors use many techniques to improve performance, reduce long pauses, and lessen sensitivity to the number of live objects, such as generational spaces, young‑generation collectors, and concurrent execution. These optimizations often require generated code (write barriers, read barriers) that can slow execution but reduce GC pause times.

The two basic GC rules remain: allocating more memory increases GC frequency, and having more live objects lengthens GC runtime.

Allocating new memory usually keeps the number of live objects low but causes frequent GC; pooling reduces allocations but keeps many buffer objects alive, leading to fewer GC cycles but longer pauses.

Other Issues

Separate allocation also incurs costs such as zero‑initialization and constructor calls. Pooling introduces overhead for tracking buffer usage, especially in multithreaded scenarios, and can lead to buffer leaks similar to classic C memory leaks.

Mixed Version

A common approach is to keep a certain pool capacity and allocate new buffers when demand exceeds it. If a released buffer is not empty, it is returned to the pool; otherwise it is discarded. This provides a good trade‑off between pooling and fresh allocation and is worth testing.

Testing

We simulate a network analyzer that captures packets, decodes protocols, and gathers statistics. The simplified model includes:

Packet class consisting of a byte buffer and its parse result (using DirectByteBuffer instead of a byte array to increase object allocation).

Data source that obtains buffers and fills random IP‑related information; it does not allocate memory besides the buffers.

Queue (FIFO) of size INTERNAL_QUEUE_SIZE (implemented with ArrayDeque ) that stores active buffers.

Handler that parses buffers, allocates temporary objects, and stores some results (e.g., TCP packets) in a structure of size STORED_COUNT .

We first study the single‑threaded case where the handler and data source run in the same thread; multithreaded scenarios will be considered later. The pool sizes used are MIX_POOL_SIZE for mixed mode and POOL_SIZE for full pooling.

Batch Strategies

To measure the cost of the test framework, we introduce a dummy strategy that uses a single packet everywhere.

# java -Xloggc:gclog -Xms2g -Xmx2g -server Main alloc batch

Test results (nanoseconds) for Dummy, Allocation, Mix, and Pooling strategies across scenarios A (low load), B (moderate load), and C (high load) are shown in the following table:

Strategy

A

B

C

Dummy

59

57

66

Allocation

400

685

4042

Mix

108

315

466

Pooling

346

470

415

Allocation is the worst strategy, especially in scenario C. Pooling performs best in batch mode, while Mix is best in scenarios A and B.

GC Analysis

GC statistics (max pause, average pause, GC count per second, GC fraction, object count, GC time per object) for each strategy and scenario are presented in detailed tables. The data show that allocation leads to high GC frequency and long pauses, while pooling reduces GC frequency but can cause long pauses in scenario A due to many live buffers.

Real‑Time Tests

Real‑time tests with source interval 1000 ns show that the allocation strategy loses packets quickly as the interval decreases, while pooling cannot handle even scenario A without packet loss. Increasing heap size reduces loss but is impractical.

G1 Garbage Collector

Switching from CMS to G1 (with -XX:+UseG1GC -XX:MaxGCPauseMillis=80 ) yields mixed results: in most cases performance degrades, but in scenarios A and B pooling becomes faster. GC logs become more complex because not every line represents a pause.

ZGC

Running on Java 11 with ZGC ( -XX:+UnlockExperimentalVMOptions -XX:+UseZGC ) shows overall improvement, especially for allocation, though G1 still outperforms ZGC in many batch cases.

Native Buffers with CMS

Moving buffers off‑heap using DirectByteBuffer reduces heap pressure. Tests with reduced heap size (1 GB) show much better results for both allocation and pooling, especially in real‑time mode where loss percentages become negligible.

Native Buffers with G1

Batch results with native buffers and G1 are slightly worse than CMS but still better than pure Java heap allocations.

Native Buffers with ZGC

ZGC combined with native buffers yields significant improvements across all scenarios.

Trying C++

Implementing the same benchmark in C++ removes GC entirely. However, memory allocation in C++ is still expensive; the pool‑free version performs best, while allocation remains slower than Java’s best configurations.

C++ Without Allocation (Flat Version)

By reserving space for the most common objects inside the buffer (zero allocation), the flat C++ version achieves the best performance across all scenarios, processing up to 4 M packets/s in scenario C.

Summary Table

All results are consolidated in a large table that highlights the best (green) and second‑best (yellow) outcomes. The flat C++ version consistently yields the best numbers, while the best Java results come from the native CMS configuration, with ZGC being a strong contender.

Conclusion

For true real‑time systems, use C or C++ and avoid allocation. Java can achieve near‑real‑time performance with careful reduction of allocations and the use of modern collectors (G1, ZGC). Pooling consistently outperforms fresh allocation in every test, especially under high load. Increasing input queue size or using off‑heap buffers can further mitigate GC pauses.

Original article: https://pzemtsov.github.io/2019/01/17/allocate-or-pool.html

JavaMemory ManagementPerformance TestingGarbage CollectionC++allocationpooling
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.