Backend Development 17 min read

Optimizing Data Transfer in Swoole 4.5: Reducing Memory Copies Between Master and Worker Processes

This article analyzes the performance bottleneck in Swoole 4.5 where the master process copies data twice before delivering it to worker processes, explains the underlying C functions and data flow, and presents two code‑level optimizations that consolidate buffers and eliminate redundant copies, resulting in a four‑fold speedup of the onMessage callback.

Xueersi Online School Tech Team
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Optimizing Data Transfer in Swoole 4.5: Reducing Memory Copies Between Master and Worker Processes

The Swoole 4.5 server incurs a performance issue because the master process copies incoming client data into a swPipeBuffer and then the worker process copies it again into a PHP zend_string , leading to two full memory copies per request.

Key functions involved are process_send_packet (master to worker), swWorker_onPipeReceive and swWorker_onTask (worker reception), and php_swoole_get_recv_data (exposing data to PHP). The master packs data into buf->data and sends it; the worker merges chunks via swServer_worker_merge_chunk , then creates a new zend_string for the PHP layer.

Optimization 1 modifies the chunk handling in process_send_packet so that buf->info.len stores the total packet length instead of each chunk size, allowing the worker to allocate a single large buffer once and receive data directly via readv , eliminating the intermediate copy.

Optimization 2 replaces the generic buffer callbacks with PHP‑specific implementations: php_swoole_server_worker_get_buffer allocates a zend_string sized to the incoming packet, php_swoole_server_worker_add_buffer_len updates the offset, and php_swoole_server_worker_copy_buffer_addr stores the buffer address. When the final chunk arrives, the worker sets ZVAL_STR(zdata, worker_buffer) , so the PHP variable points directly to the received memory without copying.

Benchmarks using a WebSocket server and a coroutine client sending 2 GB of data show that the number of memory copies drops from four to one, yielding roughly a 4× improvement in onMessage latency and lower CPU usage.

In summary, by redesigning the inter‑process buffer handling and leveraging zero‑copy techniques, the Swoole server’s data path is streamlined, demonstrating how careful C‑level optimizations can dramatically boost backend performance.

PerformancePHPIPCserver optimizationSwooleMemory Copy
Xueersi Online School Tech Team
Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.