Golang Object Pool for Reducing GC Pressure, FFmpeg Concurrency Control, and Paddle Static vs. Dynamic Graphs
The article explains how Go's lock‑free sync.Pool can cut garbage‑collection overhead, shows practical FFmpeg thread‑parameter tuning that balances CPU use and latency for video filtering versus encoding, and compares PaddlePaddle's static and dynamic graph modes, including debugging tips and conversion to static.
The article continues a technical series and presents three independent topics: (1) using Go's sync.Pool to alleviate garbage‑collection pressure, (2) controlling concurrency in FFmpeg video processing, and (3) the differences between static and dynamic graphs in PaddlePaddle.
1. Golang object pool – sync.Pool is a built‑in lock‑free object pool that caches temporary objects to avoid frequent allocations and GC overhead. Each logical processor (P) has its own poolLocal , which contains a private slot and a poolChain of ringBuffer s. The headTail 64‑bit field packs head and tail indices, enabling atomic CAS updates without mutexes. Typical usage requires a constructor function New , then Get to acquire an object and Put to return it.
2. FFmpeg concurrency control – The author needed to concatenate and transcode video clips of varying formats, sizes, and bitrates. Direct execution caused memory exhaustion and CPU saturation. FFmpeg offers three thread‑control parameters:
-filter_threads nb_threads : threads for simple filter pipelines.
-filter_complex_threads nb_threads : threads for complex filter graphs.
threads integer : threads for codec (decoding/encoding) when supported.
Experiments on a 4‑core machine with different thread settings produced the following command‑line outputs (wrapped in ... blocks):
-i -filter_complex -threads 1 -y 4.54s user 0.17s system 110% cpu 4.278 total -i -filter_complex -threads 2 -y 4.61s user 0.29s system 189% cpu 2.581 total -i -filter_complex -threads 4 -y 4.92s user 0.22s system 257% cpu 1.993 total -i -filter_complex -threads 6 -y 4.73s user 0.21s system 302% cpu 1.634 total -i -filter_complex -threads 8 -y 4.72s user 0.19s system 315% cpu 1.552 total -i -filter_complex -y 4.72s user 0.22s system 306% cpu 1.614 total -i -filter_complex -y -filter_complex_threads 1 -y 4.63s user 0.13s system 316% cpu 1.504 total -i -filter_complex -y -filter_complex_threads 2 -y 4.62s user 0.20s system 304% cpu 1.583 total -i -filter_complex -y -filter_complex_threads 4 -y 4.58s user 0.27s system 303% cpu 1.599 totalThe results show that for the author’s crop‑scale‑gblur pipeline there is little parallelism in the filter stage; increasing filter_complex_threads adds overhead without benefit. For the codec stage, higher thread counts improve CPU utilization and reduce elapsed time, with threads=2 offering the best trade‑off.
3. Paddle static vs. dynamic graphs – In static graph mode (similar to C++), the model is compiled into a ProgramDesc at compile time and executed by an Executor at runtime. In dynamic graph mode (similar to Python), operations are executed immediately without a separate compilation step. Static graphs generally deliver higher performance but are harder to debug; dynamic graphs are easier to debug but slower.
To detect the current mode, look for usage of the static module or an Executor (static) versus the dygraph module (dynamic). Debugging static graphs can be done with fluid.layers.Print() . Converting a dynamic‑graph model to static can be achieved with the decorator @paddle.jit.to_static or the function paddle.jit.to_static() . Example code:
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.jit import to_static
# static graph example
main_program = fluid.Program()
startup_program = fluid.Program()
paddle.enable_static()
with fluid.program_guard(main_program=main_program, startup_program=startup_program):
x = fluid.layers.data(name='x', shape=[2], dtype='float32')
y = fluid.layers.data(name='y', shape=[2], dtype='float32')
out = fluid.layers.elementwise_add(x, y)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
result = exe.run(fetch_list=[out], feed={'x': np.ones([2,2]), 'y': np.ones([2,2])})
print(result)
# dynamic graph example
with fluid.dygraph.guard():
x = np.ones([2,2], np.float32)
y = np.ones([2,2], np.float32)
x = fluid.dygraph.to_variable(x)
y = fluid.dygraph.to_variable(y)
out = fluid.layers.elementwise_add(x, y)
print(out.numpy())
class MyNet(paddle.nn.Layer):
def __init__(self):
super(MyNet, self).__init__()
self.fc = fluid.dygraph.Linear(input_dim=4, output_dim=2, act='relu')
@to_static
def forward(self, x, y):
x = fluid.dygraph.to_variable(x)
x = self.fc(x)
y = fluid.dygraph.to_variable(y)
loss = fluid.layers.cross_entropy(input=x, label=y)
return loss
net = MyNet()
net.eval()
out = net(np.ones([16,4], np.float32), np.ones([16,1], np.int64))The article concludes with recommendations for further reading on mobile development, frontend development, R&D efficiency, cloud‑native technologies, and AI testing.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.