Why Netty Uses FastThreadLocal and How It Outperforms JDK ThreadLocal
This article explains the motivation behind Netty's FastThreadLocal, details its internal design using an indexed array to avoid hash collisions, analyzes the core source code of InternalThreadLocalMap, FastThreadLocalThread, and FastThreadLocal, and discusses performance implications and reclamation strategies.
1 Background and Principle of FastThreadLocal
Since JDK already provides ThreadLocal, why does Netty implement its own FastThreadLocal? The answer lies in the implementation of JDK ThreadLocal. In each Java thread there is a ThreadLocalMap that is created on first use. This map uses linear probing for hash collisions, which can degrade performance.
FastThreadLocal (ftl) avoids hash collisions by using a simple array. Each FastThreadLocal instance gets a unique index allocated by an AtomicInteger.
When ftl.get() is called, the value is retrieved directly from the array, e.g. return array[index].
2 Source Code Analysis
The implementation involves InternalThreadLocalMap, FastThreadLocalThread, and FastThreadLocal classes. Starting from InternalThreadLocalMap:
2.1 UnpaddedInternalThreadLocalMap fields
static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<>();
static final AtomicInteger nextIndex = new AtomicInteger();
Object[] indexedVariables;The array indexedVariables stores the values of FastThreadLocal. nextIndex provides a unique index for each FastThreadLocal instance.
2.2 InternalThreadLocalMap
Key fields include:
// marker for unused slots
public static final Object UNSET = new Object();
/** BitSet indicating whether a FastThreadLocal has registered a cleaner */
private BitSet cleanerFlags;The method newIndexedVariableTable() creates a 32‑element array filled with UNSET.
2.3 FastThreadLocalThread
FastThreadLocalThread extends Thread and holds its own InternalThreadLocalMap, enabling fast access to FastThreadLocal variables.
public final InternalThreadLocalMap threadLocalMap() { return threadLocalMap; }
public final void setThreadLocalMap(InternalThreadLocalMap threadLocalMap) { this.threadLocalMap = threadLocalMap; }2.4 FastThreadLocal implementation
Each FastThreadLocal gets an index in its constructor:
private final int index;
public FastThreadLocal() { index = InternalThreadLocalMap.nextVariableIndex(); }The get() method:
public final V get() {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
Object v = threadLocalMap.indexedVariable(index);
if (v != InternalThreadLocalMap.UNSET) {
return (V) v;
}
V value = initialize(threadLocalMap);
registerCleaner(threadLocalMap);
return value;
}Initialization stores the value in the array and registers the variable for later removal.
2.5 Degradation on ordinary threads
If a thread is not a FastThreadLocalThread, Netty falls back to the JDK ThreadLocal implementation, which incurs the same overhead.
3 FastThreadLocal Resource Reclamation
Netty provides three reclamation mechanisms: automatic cleanup after a FastThreadLocalRunnable finishes, manual removal via remove(), and a Cleaner‑based automatic cleanup (disabled in Netty 4.1.34).
4 Usage in Netty
FastThreadLocal is primarily used to cache per‑thread memory pools for ByteBuf allocation, reducing contention and improving performance.
final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {
@Override
protected synchronized PoolThreadCache initialValue() {
// allocate or retrieve per‑thread cache
}
}Architect's Must-Have
Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
