How Linux Implements High‑Performance Timers with a Hierarchical Timing Wheel
This article explains the various timer implementations—sorted linked list, min‑heap, balanced binary tree, and especially the hierarchical timing wheel used in the Linux kernel—detailing their time complexities, data structures, and the core C code that inserts and executes timers efficiently.
Timer Mechanisms
Linux uses several data‑structure‑based timer implementations, each with different time‑complexity trade‑offs for insertion and expiration retrieval.
Sorted Linked List Approach
Timers are stored in a sorted linked list, allowing O(1) retrieval of the earliest timer but requiring O(n) time to insert a new timer because the whole list must be traversed.
Min‑Heap Approach
Using a min‑heap gives O(1) retrieval of the smallest timer and O(log n) insertion time.
Balanced Binary Tree Approach
Balanced trees (e.g., red‑black trees) provide O(log n) retrieval of the minimum timer; with a cached minimum value, retrieval can be O(1). Insertion remains O(log n).
Timing Wheel
For high‑frequency timer usage in Linux (e.g., TCP retransmission), the above structures are insufficient. Linux therefore employs a hierarchical timing wheel, which works like a clock: each second moves the “second” pointer, and when it wraps, the “minute” pointer advances, and so on.
The wheel divides the 32‑bit timer range (0 ~ 4294967295) into five hierarchical levels. The first level stores timers for 0 ~ 255 seconds, the second for 256 ~ 256·64 seconds, the third for 256·64 ~ 256·64·64 seconds, the fourth for 256·64·64 ~ 256·64·64·64 seconds, and the fifth for 256·64·64·64 ~ 256·64·64·64·64 seconds.
Note: The first slot of levels two through five does not hold any timers.
Each level has a pointer indicating the current slot to execute. Every second, the first‑level pointer advances; when it wraps to zero, the next level pointer advances, and timers in that slot are re‑hashed into the appropriate lower‑level slots.
Both timer expiration and insertion have O(1) complexity because the algorithm only moves pointers and links timers into pre‑computed slots.
Linux Timing Wheel Implementation
The kernel defines five arrays representing the five levels of the wheel.
#define TVN_BITS 6
#define TVR_BITS 8
#define TVN_SIZE (1 << TVN_BITS) // 64
#define TVR_SIZE (1 << TVR_BITS) // 256
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)
struct timer_vec {
int index;
struct list_head vec[TVN_SIZE];
};
struct timer_vec_root {
int index;
struct list_head vec[TVR_SIZE];
};
static struct timer_vec tv5;
static struct timer_vec tv4;
static struct timer_vec tv3;
static struct timer_vec tv2;
static struct timer_vec_root tv1;
void init_timervecs(void) {
int i;
for (i = 0; i < TVN_SIZE; i++) {
INIT_LIST_HEAD(tv5.vec + i);
INIT_LIST_HEAD(tv4.vec + i);
INIT_LIST_HEAD(tv3.vec + i);
INIT_LIST_HEAD(tv2.vec + i);
}
for (i = 0; i < TVR_SIZE; i++)
INIT_LIST_HEAD(tv1.vec + i);
}The internal_add_timer() function determines the appropriate level for a new timer based on its expiration distance and inserts it into the corresponding slot.
static inline void internal_add_timer(struct timer_list *timer) {
unsigned long expires = timer->expires;
unsigned long idx = expires - timer_jiffies;
struct list_head *vec;
if (idx < TVR_SIZE) { // 0 ~ 255
int i = expires & TVR_MASK;
vec = tv1.vec + i;
} else if (idx < 1 << (TVR_BITS + TVN_BITS)) { // 256 ~ 16191
int i = (expires >> TVR_BITS) & TVN_MASK;
vec = tv2.vec + i;
} else if (idx < 1 << (TVR_BITS + 2 * TVN_BITS)) {
int i = (expires >> (TVR_BITS + TVN_BITS)) & TVN_MASK;
vec = tv3.vec + i;
} else if (idx < 1 << (TVR_BITS + 3 * TVN_BITS)) {
int i = (expires >> (TVR_BITS + 2 * TVN_BITS)) & TVN_MASK;
vec = tv4.vec + i;
} else if ((signed long)idx < 0) {
vec = tv1.vec + tv1.index;
} else if (idx <= 0xffffffffUL) {
int i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
vec = tv5.vec + i;
} else {
INIT_LIST_HEAD(&timer->list);
return;
}
list_add(&timer->list, vec->prev);
}Expired timers are processed by run_timer_list(), which advances the first‑level pointer each tick, cascades timers from higher levels when a slot wraps, and invokes each timer’s callback.
static inline void cascade_timers(struct timer_vec *tv) {
struct list_head *head, *curr, *next;
head = tv->vec + tv->index;
curr = head->next;
while (curr != head) {
struct timer_list *tmp = list_entry(curr, struct timer_list, list);
next = curr->next;
list_del(curr);
internal_add_timer(tmp);
curr = next;
}
INIT_LIST_HEAD(head);
tv->index = (tv->index + 1) & TVN_MASK;
}
static inline void run_timer_list(void) {
spin_lock_irq(&timerlist_lock);
while ((long)(jiffies - timer_jiffies) >= 0) {
if (!tv1.index) {
int n = 1;
do {
cascade_timers(tvecs[n]);
} while (tvecs[n]->index == 1 && ++n < NOOF_TVECS);
}
repeat:
struct list_head *head = tv1.vec + tv1.index;
struct list_head *curr = head->next;
if (curr != head) {
struct timer_list *timer = list_entry(curr, struct timer_list, list);
void (*fn)(unsigned long) = timer->function;
unsigned long data = timer->data;
detach_timer(timer);
timer->list.next = timer->list.prev = NULL;
timer_enter(timer);
spin_unlock_irq(&timerlist_lock);
fn(data);
spin_lock_irq(&timerlist_lock);
timer_exit();
goto repeat;
}
++timer_jiffies;
tv1.index = (tv1.index + 1) & TVR_MASK;
}
spin_unlock_irq(&timerlist_lock);
}Thus, Linux achieves O(1) complexity for both inserting a timer and executing an expired timer by leveraging the hierarchical timing wheel.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
