How a SIGPIPE Signal Crashed Our Service and the Fix We Applied
During a gray‑release, a Go‑Rust service repeatedly crashed when a dependent process was hot‑upgraded; the root cause was an unhandled SIGPIPE signal generated by the kernel on a broken TCP connection, which terminated the process without a core dump, and the article explains the kernel mechanics and the solution of ignoring SIGPIPE.
Fault Background
We partially rewrote a core Go service in Rust, using cgo for Go‑Rust communication. After deploying the new service in a gray‑release, it crashed whenever a dependent business process was hot‑upgraded. The crash left no log entries or core dump, making diagnosis difficult.
After extensive debugging, we discovered that a tiny network SIGPIPE signal caused the termination. Setting the process to ignore SIGPIPE ( SIGIGN) eliminated the issue.
1. How SIGPIPE Is Generated
When a TCP connection is broken—e.g., due to network problems or the peer restarting—the kernel still allows the user‑space program to call send or write. During the send path the kernel detects the broken socket and raises SIGPIPE for the calling process.
The relevant kernel code is in net/core/stream.c>do_tcp_sendpages:
//file:net/core/stream.c
ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
size_t size, int flags)
{
......
err = -EPIPE;
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
goto out_err;
out_err:
return sk_stream_error(sk, flags, err);
}The helper sk_stream_error actually sends the signal:
//file:net/core/stream.c
int sk_stream_error(struct sock *sk, int flags, int err)
{
......
if (err == -EPIPE && !(flags & MSG_NOSIGNAL))
send_sig(SIGPIPE, current, 0);
return err;
}2. Kernel SIGPIPE Handling Process
When the process returns from kernel mode to user mode, the kernel checks for pending signals. If a signal is pending, it enters the signal‑delivery path via do_signal:
//file:arch/x86/kernel/signal.c
static void do_signal(struct pt_regs *regs)
{
struct ksignal ksig;
...
if (get_signal(&ksig)) {
/* Actually deliver the signal. */
handle_signal(&ksig, regs);
return;
}
...
} get_signalextracts the signal and decides how to handle it:
//file:kernel/signal.c
bool get_signal(struct ksignal *ksig)
{
...
for (;;) {
// 1. dequeue a signal
signr = dequeue_synchronous_signal(&ksig->info);
if (!signr)
signr = dequeue_signal(current, ¤t->blocked,
&ksig->info, &type);
// 2. check user‑space handler
if (ka->sa.sa_handler == SIG_IGN)
continue; // ignore
if (ka->sa.sa_handler != SIG_DFL) {
ksig->ka = *ka;
...
break; // user handler will run later
}
// 3. default kernel behavior
...
}
out:
ksig->sig = signr;
return ksig->sig > 0;
}The default behavior distinguishes four classes of signals. For signals that are not ignored, stopped, or cause a core dump, the kernel executes do_group_exit, which terminates all threads of the process **without generating a core dump**. SIGPIPE falls into this class.
Relevant macro definitions (excerpted):
//file:include/linux/signal.h
#define sig_kernel_ignore(sig) siginmask(sig, SIG_KERNEL_IGNORE_MASK)
#define SIG_KERNEL_IGNORE_MASK (\
rt_sigmask(SIGCONT) | rt_sigmask(SIGCHLD) |\
rt_sigmask(SIGWINCH) | rt_sigmask(SIGURG) )
#define sig_kernel_stop(sig) siginmask(sig, SIG_KERNEL_STOP_MASK)
#define SIG_KERNEL_STOP_MASK (\
rt_sigmask(SIGSTOP) | rt_sigmask(SIGTSTP) |\
rt_sigmask(SIGTTIN) | rt_sigmask(SIGTTOU) )
#define sig_kernel_coredump(sig) siginmask(sig, SIG_KERNEL_COREDUMP_MASK)
#define SIG_KERNEL_COREDUMP_MASK (\
rt_sigmask(SIGQUIT) | rt_sigmask(SIGILL) |\
rt_sigmask(SIGTRAP) | rt_sigmask(SIGABRT) |\
rt_sigmask(SIGFPE) | rt_sigmask(SIGSEGV) |\
rt_sigmask(SIGBUS) | rt_sigmask(SIGSYS) |\
rt_sigmask(SIGXCPU) | rt_sigmask(SIGXFSZ) |\
SIGEMT_MASK )Since our service did not install a handler for SIGPIPE, get_signal followed the third path and the kernel terminated the process without a core dump.
3. Application‑Level Mitigation
Understanding the crash flow makes the fix straightforward: install a handler that ignores SIGPIPE. In Rust we added the following code:
// set SIGPIPE handler to ignore
let ignore_action = SigAction::new(
SigHandler::SigIgn, // ignore signal
signal::SaFlags::empty(),
SigSet::empty(),
);
unsafe {
signal::sigaction(Signal::SIGPIPE, &ignore_action)
.expect("Failed to set SIGPIPE handler to ignore");
}With this handler, the kernel’s get_signal sees SIG_IGN and skips the default termination, preventing the crash.
4. Why It Usually Doesn’t Appear in Pure Go Programs
Go’s runtime already configures SIGPIPE handling: writes to stdout / stderr cause process exit, while writes on other file descriptors merely return EPIPE. This behavior is documented in os/signal/doc.go:
If the program has not called Notify to receive SIGPIPE signals, then
the behavior depends on the file descriptor number. A write to
broken pipe on file descriptors 1 or 2 (standard output or standard
error) will cause the program to exit with a SIGPIPE signal. A write
to a broken pipe on some other file descriptor will take no action on
the SIGPIPE signal, and the write will fail with an EPIPE error.However, in a cgo scenario where Go calls into Rust code, the signal may be delivered on a non‑Go thread. Go’s comment states:
If the SIGPIPE is received on a non-Go thread the signal will
be forwarded to the non-Go handler, if any; if there is none the
default system handler will cause the program to terminate.Because the Rust side had no handler, the signal fell back to the kernel’s default, causing the observed termination.
Conclusion
The crash was caused by a hot‑upgrade of a dependent service that broke a TCP connection, leading the kernel to send SIGPIPE. Without a user‑space handler, the kernel’s default action terminated the process without a core dump. Adding a simple ignore handler for SIGPIPE in the Rust component resolved the issue.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
