Uncovering Hidden PHP‑CGI Deadlocks: Why Disk Space Stalls and How to Fix Them
A deep dive into a long‑standing PHP‑CGI deadlock that left deleted log files occupying disk space, explaining how signal‑unsafe functions caused the lock, how the issue was diagnosed with lsof, strace and gdb, and the practical steps to eliminate the deadlock.
Problem discovery
Online machines reported disk‑space alarms even after log files were cleared. Inspection with ps aux | grep php-cgi showed many CGI processes that had been running for days, weeks, or months, far beyond the normal one‑day restart cycle, indicating a problem.
Using lsof -p [pid] revealed that these long‑running CGI processes kept file handles open to log files that had already been deleted, preventing disk space from being reclaimed. The root cause was identified as CGI processes not closing file handles.
Further analysis with strace -p [pid] showed that all abnormal processes were blocked in the fmutex state, confirming a deadlock. The deadlock prevented file handles from being closed, leading to the disk‑space anomaly.
Why did the CGI processes deadlock?
Although CGI processes are single‑threaded, the deadlock was not caused by multithreaded resource locking. Instead, the deadlock occurred during PHP’s shutdown phase when a signal handler invoked a non‑signal‑safe function.
Function reentrancy and signal safety
Re‑entrant functions can be called safely from signal handlers, but many thread‑safe functions use a global lock that is not signal‑safe. If a signal interrupts a thread while such a lock is held and the handler calls the same function, a deadlock ensues.
PHP‑CGI execution flow
The glibc time functions use a global lock for thread safety but lack signal safety. When a PHP‑CGI process receives a signal (e.g., SIGPROF) during execution of a time function, the lock remains held. The signal handler then calls the same time function, causing a deadlock.
All deadlocked CGI processes recorded the error message “Max execution timeout of 60 seconds exceeded”. The 60‑second timeout triggers a SIGPROF signal, which interrupts the process during a glibc time function call. The shutdown routine then invokes user‑defined shutdown functions that also call time functions, completing the deadlock cycle.
Relevant code snippet
void zend_set_timeout(long seconds)
{
TSRMLS_FETCH();
EG(timeout_seconds) = seconds;
if (!seconds) {
return;
}
// ...
setitimer(ITIMER_PROF, &t_r, NULL);
signal(SIGPROF, zend_timeout); // calls Zend timeout handler
sigemptyset(&sigset);
sigaddset(&sigset, SIGPROF);
// ...
}Debugging with gdb showed that all PHP‑CGI processes were blocked in zend_request_shutdown, which calls user‑registered shutdown functions. If a shutdown function accesses a non‑signal‑safe time function while the global lock is held, a deadlock occurs.
In this case, a custom shutdown hook registered via register_shutdown_function('SimpleWebSvc::shutdown') used a qalarm system that, during shutdown, called a time function, creating the deadlock scenario.
Conclusion
The deadlock was caused by invoking non‑signal‑safe functions (glibc time functions) inside a signal handler during PHP‑CGI shutdown.
Solution
Remove or simplify the qalarm hook registered in the shutdown function to avoid unsafe function calls.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
