Understanding Linux Kernel likely/unlikely Macros for Performance Optimization
This article explains how the Linux kernel's likely and unlikely macros, which wrap __builtin_expect, guide the compiler's branch prediction to improve cache usage and pipeline efficiency, and demonstrates the effect with sample C code and assembly output.
Hello, I am Fei Ge! Today I share a common kernel performance trick that can be very beneficial once understood.
In many places within the kernel, the macros likely and unlikely are used, for example in the TCP connection establishment code.
//file: net/ipv4/tcp_ipv4.c
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
if (likely(!do_fastopen))
...
}
//file: net/ipv4/tcp_input.c
int tcp_rcv_established(struct sock *sk, ...)
{
if (unlikely(sk->sk_rx_dst == NULL))
......
}Initially I did not understand the purpose of these macros, but later realized they help improve performance. Below we explore how they work.
1. likely and unlikely
First, let’s look at their low‑level implementation.
//file: include/linux/compiler.h
#define likely(x) __builtin_expect(!!(x),1)
#define unlikely(x) __builtin_expect(!!(x),0)These are simple wrappers around GCC's __builtin_expect , which lets programmers tell the compiler which branch is more likely to be taken, aiding branch prediction.
The macro expands to __builtin_expect(!!(x),1) meaning the expression is expected to be true, and __builtin_expect(!!(x),0) meaning it is expected to be false.
When used correctly, the compiler places the more probable code path close to the preceding instructions, reducing costly jumps.
2. Hands‑on verification
We write two simple examples to see the compiler’s optimization. The test program is:
#include
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
int main(int argc, char *argv[])
{
int n;
n = atoi(argv[1]);
if (likely(n == 10)){
n = n + 2;
} else {
n = n - 2;
}
printf("%d\n", n);
return 0;
}Compiling with -O2 and inspecting the assembly (using objdump -S ) shows that the branch predicted as likely has its target instructions placed immediately after the jne instruction, while the unlikely path is farther away.
Switching to unlikely reverses the layout, confirming the effect.
The test code and Makefile are available on GitHub for readers to experiment.
3. Why performance improves
The improvement comes from two CPU mechanisms: cache hierarchy and pipeline execution.
Modern CPUs have multi‑level caches (L1, L2, L3). When the likely branch’s instructions are placed contiguously, they are more likely to be fetched into the L1 cache, avoiding slower accesses to deeper cache levels.
Each cache line is 64 bytes; fetching one line brings the next 64 bytes of code into cache, so sequential execution of the likely path benefits from higher cache‑hit rates.
Additionally, CPUs use pipelines that overlap the execution stages of multiple instructions. When the predicted path is correct, the pipeline stays full; a misprediction forces the pipeline to discard work, wasting cycles.
In the example, the jne instruction is followed immediately by the mov instructions of the likely branch, keeping the pipeline fed. If the branch prediction were wrong, those instructions would be flushed.
4. Summary
In summary, the likely and unlikely macros are a simple yet powerful way to assist the compiler’s branch prediction, leading to better cache utilization and smoother pipeline execution, which together improve overall program performance.
Linux kernel developers use these techniques extensively to squeeze out every ounce of performance.
However, they must be used accurately; an incorrect prediction can degrade performance.
Thanks for reading – please like and share!
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.