Unveiling Netty and Kafka Time Wheels: High‑Performance Scheduling Explained
This article explores the design and implementation of time wheel algorithms in Netty and Kafka, comparing their single‑layer and multi‑layer approaches, analyzing performance trade‑offs, and detailing how these systems achieve O(1) scheduling for massive delayed tasks while avoiding empty ticks.
All examples are based on Netty 4.1.112.Final and Kafka 3.9.0.
In business development we often encounter many scheduled tasks such as generating reports, periodic reconciliation, data synchronization, and order payment timeout handling. For business scenarios, scheduled tasks are usually complex and long‑running, leading to many mature middleware solutions like ElasticJob, XXL‑JOB, PowerJob, etc.
Middleware also has many simple, short‑duration scheduled tasks such as connection heartbeats, request timeout retries, and reconnection attempts. These tasks have low precision requirements; a delay of a few tens of milliseconds is acceptable.
Characteristics of middleware scheduled tasks:
Massive number of tasks
Simple logic
Short execution time
Low timing strictness
Most middleware use a time wheel to schedule such tasks. This article focuses on the design and implementation of the time wheel, but first we need to understand why the time wheel design arose by looking at JDK's scheduling components.
1. JDK Scheduling Components
1.1 Timer
public class Timer {
private final TaskQueue queue = new TaskQueue();
private final TimerThread thread = new TimerThread(queue);
}Timer uses a priority queue (a binary min‑heap) to order tasks by execution time. The TaskQueue holds tasks, and TimerThread continuously polls the head of the queue, executing tasks whose executionTime <= currentTime. If the head task is not yet due, the thread waits for executionTime - currentTime.
private void mainLoop() {
while (true) {
try {
TimerTask task;
boolean taskFired;
synchronized (queue) {
long currentTime, executionTime;
task = queue.getMin();
synchronized (task.lock) {
currentTime = System.currentTimeMillis();
executionTime = task.nextExecutionTime;
if (taskFired = (executionTime <= currentTime)) {
if (task.period == 0) {
queue.removeMin();
task.state = TimerTask.EXECUTED;
} else {
queue.rescheduleMin(task.period < 0 ? currentTime - task.period
: executionTime + task.period);
}
}
}
if (!taskFired)
queue.wait(executionTime - currentTime);
}
if (taskFired)
task.run();
} catch (InterruptedException e) {
}
}
}Four major drawbacks of Timer for middleware scenarios:
Adding or removing a task costs O(log n) because of heap adjustments.
Only a single TimerThread processes tasks, becoming a bottleneck under heavy load.
Exceptions thrown by a task are not caught, which can terminate the timer thread and block remaining tasks.
Timer relies on the system clock; inaccurate system time leads to scheduling errors.
1.2 DelayQueue
DelayQueue is a blocking queue that orders elements by their remaining delay. Internally it also uses a min‑heap.
public class DelayQueue<E extends Delayed> extends AbstractQueue<E> implements BlockingQueue<E> {
private final PriorityQueue<E> q = new PriorityQueue<>();
}Elements must implement Delayed, providing getDelay(TimeUnit) and compareTo. When getDelay returns a value ≤ 0 the element is ready to be taken.
1.3 ScheduledThreadPoolExecutor
ScheduledThreadPoolExecutor builds on DelayQueue, adding a pool of worker threads. It stores tasks in a DelayedWorkQueue (also a heap) and executes them when due. Although it supports concurrency, the underlying heap still gives O(log n) for add/remove, which is sub‑optimal for massive short‑lived tasks.
2. Netty Time Wheel Design Principles
The idea is inspired by a clock with three hands. A clock’s second hand moves every second, the minute hand every 60 seconds, and the hour hand every 60 minutes. By abstracting the clock’s ticks into a data structure, we can schedule tasks when the corresponding hand points to a tick.
Netty’s time wheel ( HashedWheelTimer) uses a circular array called wheel with 512 slots by default. Each slot is a HashedWheelBucket, a doubly‑linked list of HashedWheelTimeout objects.
public class HashedWheelTimer implements Timer {
private final HashedWheelBucket[] wheel;
private final long tickDuration; // default 100 ms, min 1 ms
}The timer has a single worker thread ( workerThread) that advances a pointer tick every tickDuration. When the pointer reaches a bucket, all tasks whose remainingRounds == 0 are executed; tasks with a larger remainingRounds are skipped and their counter is decremented.
private static final class HashedWheelTimeout implements Timeout, Runnable {
private final TimerTask task;
private final long deadline; // absolute time based on startTime
long remainingRounds;
// ... cancel, expire, etc.
}Key points:
Adding or removing a task is O(1) because it only manipulates linked‑list pointers.
The worker thread may sleep up to tickDuration, causing a scheduling jitter of roughly one tick (≈100 ms by default).
Tasks added just after the worker goes to sleep can be delayed up to an entire tick, while tasks added just before the next tick incur minimal delay.
For higher precision you can lower tickDuration (not below 1 ms), at the cost of higher CPU usage.
3.1 Core Structures
private static final class HashedWheelBucket {
private HashedWheelTimeout head;
private HashedWheelTimeout tail;
void addTimeout(HashedWheelTimeout timeout) { /* O(1) */ }
void remove(HashedWheelTimeout timeout) { /* O(1) */ }
void expireTimeouts(long deadline) { /* execute due tasks */ }
}3.2 Adding a Task
When a user calls newTimeout, Netty lazily starts the worker thread, computes the absolute deadline (current nano‑time + delay – startTime), wraps the user task in a HashedWheelTimeout, and puts it into a MPSC queue ( timeouts).
public Timeout newTimeout(TimerTask task, long delay, TimeUnit unit) {
start();
long deadline = System.nanoTime() + unit.toNanos(delay) - startTime;
HashedWheelTimeout timeout = new HashedWheelTimeout(this, task, deadline);
timeouts.add(timeout);
return timeout;
}The worker thread, on each tick, transfers pending timeouts from the MPSC queue into the appropriate bucket. The bucket index is calculated as (deadline / tickDuration) & mask, where mask = wheel.length - 1. The remainingRounds field records how many full wheel rotations must pass before the task becomes eligible.
3.3 Cancelling a Task
Cancellation also uses a MPSC queue ( cancelledTimeouts). The worker thread periodically removes cancelled tasks from their buckets.
3.4 Starting the Timer
The timer has three states: WORKER_STATE_INIT, WORKER_STATE_STARTED, and WORKER_STATE_SHUTDOWN. The first call to newTimeout atomically transitions the state to STARTED and launches the worker thread, which records startTime = System.nanoTime() and counts down a latch so callers can see the exact start moment.
3.5 Running Loop
private final class Worker implements Runnable {
public void run() {
startTime = System.nanoTime();
startTimeInitialized.countDown();
do {
long deadline = waitForNextTick();
if (deadline > 0) {
int idx = (int) (tick & mask);
processCancelledTasks();
HashedWheelBucket bucket = wheel[idx];
transferTimeoutsToBuckets();
bucket.expireTimeouts(deadline);
tick++;
}
} while (workerState == WORKER_STATE_STARTED);
// cleanup, move remaining tasks to unprocessed set, etc.
}
} waitForNextTickcomputes the next deadline as (tick + 1) * tickDuration and sleeps the remaining time, guaranteeing at least 1 ms sleep to avoid busy‑waiting.
3.6 Why Scheduling Jitter Exists
The jitter originates from two sources:
Heavy or long‑running tasks block the worker thread, delaying the next tick.
Tasks added at arbitrary moments may fall into a bucket that will not be processed until the wheel completes a full rotation, adding up to one full tick of delay.
For example, adding a 5 ms task when the wheel is at tick 0 causes the task to sit in bucket 0 and wait until the worker wakes at 100 ms, incurring ~95 ms extra delay. Adding the same task just before the wheel reaches bucket 1 may cause it to be placed in bucket 1 and wait until 200 ms, incurring ~100 ms delay.
3.7 Stopping the Timer
Calling stop() atomically moves the state to SHUTDOWN, interrupts the worker thread, waits for it to finish, then collects all tasks that have not yet been executed, cancels them, and returns the set of cancelled tasks to the caller.
4. Limitations of a Single‑Layer Time Wheel
Netty’s wheel has 512 slots and a 100 ms tick, giving a 51.2 s full rotation. If tasks are sparse, the worker thread spins empty ticks, wasting CPU. Moreover, tasks whose delay exceeds one full rotation must stay in a bucket with a non‑zero remainingRounds, causing the worker to iterate over many stale entries on each tick.
5. Kafka Multi‑Layer Time Wheel Design
Kafka adopts a hierarchical wheel. The first layer has 20 slots with a 1 ms tick (20 ms full rotation). When a task’s delay exceeds the current layer’s interval, Kafka creates an overflow wheel whose tick equals the lower layer’s interval. Each higher layer’s interval is tickMs * wheelSize. This continues on demand, yielding layers such as:
Layer 0: tick = 1 ms, wheelSize = 20, interval = 20 ms
Layer 1: tick = 20 ms, wheelSize = 20, interval = 400 ms
Layer 2: tick = 400 ms, wheelSize = 20, interval = 8 s
Each slot is a TimerTaskList, a doubly‑linked circular list with a single root node and an expiration timestamp. When a TimerTaskList becomes non‑empty, its expiration is set to the start time of that slot and the list is offered to a global DelayQueue<TimerTaskList>. The DelayQueue blocks the worker (the "Reaper") until the next list expires, eliminating empty‑tick spinning.
class TimerTaskList implements Delayed {
private final AtomicLong expiration = new AtomicLong(-1L);
private final TimerTaskEntry root = new TimerTaskEntry(null, -1L);
// add, remove, flush, etc.
}When the Reaper polls a TimerTaskList, it first advances all layers’ currentTimeMs to the list’s expiration. Then it flushes the list: each contained TimerTaskEntry is re‑inserted into the lowest layer that can accommodate its remaining delay. If the remaining delay is already past the current time, the task is executed immediately via a single‑threaded ExecutorService.
public boolean add(TimerTaskEntry entry) {
long expiration = entry.expirationMs;
if (expiration < currentTimeMs + tickMs) return false; // already due
if (expiration < currentTimeMs + interval) {
long virtualId = expiration / tickMs;
int bucketId = (int) (virtualId % wheelSize);
TimerTaskList bucket = buckets[bucketId];
bucket.add(entry);
if (bucket.setExpiration(virtualId * tickMs))
delayQueue.offer(bucket);
return true;
} else {
if (overflowWheel == null) addOverflowWheel();
return overflowWheel.add(entry);
}
}The Reaper’s loop looks like:
public void doWork() {
try {
timer.advanceClock(WORK_TIMEOUT_MS);
} catch (InterruptedException ignored) {}
}In SystemTimer.advanceClock, the Reaper polls the DelayQueue with a timeout, then, under a write lock, repeatedly:
Advances the current layer (and any overflow layers) to the list’s expiration.
Calls bucket.flush(this::addTimerTaskEntry) to re‑insert tasks into appropriate lower layers or execute them.
This design solves both problems of Netty’s wheel:
No empty‑tick spinning because the Reaper only wakes when a TimerTaskList expires.
Large delays are handled by higher layers, keeping each bucket small and avoiding costly scans.
6. Implementation Details of Kafka’s Multi‑Layer Wheel
6.1 Timer Creation
SystemTimerholds a single‑threaded executor for task execution, a global DelayQueue<TimerTaskList>, a task counter, and a reference to the lowest TimingWheel. The lowest wheel is created with tickMs=1, wheelSize=20, and the current high‑resolution time as startMs.
public SystemTimer(String name) {
this(name, 1L, 20, Time.SYSTEM.hiResClockMs());
}
public SystemTimer(String name, long tickMs, int wheelSize, long startMs) {
this.taskExecutor = Executors.newFixedThreadPool(1,
r -> KafkaThread.nonDaemon(SYSTEM_TIMER_THREAD_PREFIX + name, r));
this.delayQueue = new DelayQueue<>();
this.taskCounter = new AtomicInteger(0);
this.timingWheel = new TimingWheel(tickMs, wheelSize, startMs, taskCounter, delayQueue);
}6.2 Adding a Task
When add(TimerTask) is called, the task’s absolute expiration is computed as now + delay. A TimerTaskEntry is created and handed to TimingWheel.add. If the entry cannot be placed in the current layer, an overflow wheel is created on demand and the entry is delegated upward.
6.3 Advancing the Clock
The Reaper calls SystemTimer.advanceClock with a timeout (e.g., 200 ms). The method polls the DelayQueue for the next expired TimerTaskList. For each list, it:
Acquires a write lock.
Calls timingWheel.advanceClock(bucket.getExpiration()) to move all layers forward.
Flushes the bucket, re‑inserting its tasks into lower layers or executing them.
If no bucket expires within the timeout, the method returns false and the Reaper sleeps again.
7. Summary
JDK’s Timer, DelayQueue, and ScheduledThreadPoolExecutor all rely on a binary min‑heap, giving O(log n) add/remove performance. Netty’s HashedWheelTimer improves this to O(1) by using a circular array of buckets and a single worker thread, but suffers from empty‑tick spinning and poor handling of very long delays. Kafka’s hierarchical wheel combines a DelayQueue of bucket expirations with on‑demand overflow wheels, achieving constant‑time scheduling for massive numbers of tasks while eliminating unnecessary wake‑ups and efficiently supporting long‑range delays.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Bin's Tech Cabin
Original articles dissecting source code and sharing personal tech insights. A modest space for serious discussion, free from noise and bureaucracy.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
