Understanding Netty's FastThreadLocal: Background, Implementation, and Usage
This article explains why Netty created FastThreadLocal, describes its internal design and source‑code analysis—including UnpaddedInternalThreadLocalMap, InternalThreadLocalMap, FastThreadLocalThread, and FastThreadLocal implementations—covers performance degradation in ordinary threads, outlines its three resource‑recycling mechanisms, and shows how it is used for ByteBuf allocation in Netty.
1. FastThreadLocal Background and Principle Overview
Although the JDK already provides ThreadLocal, Netty implements its own FastThreadLocal (ftl) to avoid the hash‑collision overhead of the standard ThreadLocalMap by using a simple array indexed with a unique integer.
Each FastThreadLocal instance receives an index allocated by an AtomicInteger . When ftl.get() is called, the value is retrieved directly from the array, e.g.:
ftl.get()
return array[index]2. Source Code Analysis
2.1 Main Attributes of UnpaddedInternalThreadLocalMap
static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();
static final AtomicInteger nextIndex = new AtomicInteger();
Object[] indexedVariables;The indexedVariables array stores FastThreadLocal values; nextIndex provides a unique index for each FastThreadLocal instance; slowThreadLocalMap is used when the current thread is not a FastThreadLocalThread.
2.2 InternalThreadLocalMap Analysis
// Marker for an unused slot
public static final Object UNSET = new Object();
/**
* BitSet used to record whether a FastThreadLocal has registered a cleaner.
*/
private BitSet cleanerFlags;
private InternalThreadLocalMap() {
super(newIndexedVariableTable());
}
private static Object[] newIndexedVariableTable() {
Object[] array = new Object[32];
Arrays.fill(array, UNSET);
return array;
}The newIndexedVariableTable() method creates a 32‑element array filled with UNSET . FastThreadLocal values are stored directly in this array, not as map entries, which differs from JDK ThreadLocal.
2.3 FastThreadLocalThread (ftlt) Implementation
public class FastThreadLocalThread extends Thread {
private final boolean cleanupFastThreadLocals;
private InternalThreadLocalMap threadLocalMap;
public final InternalThreadLocalMap threadLocalMap() {
return threadLocalMap;
}
public final void setThreadLocalMap(InternalThreadLocalMap threadLocalMap) {
this.threadLocalMap = threadLocalMap;
}
}FastThreadLocalThread extends Thread and aggregates its own InternalThreadLocalMap . When a FastThreadLocal is accessed from such a thread, the value is fetched directly from the thread‑local map.
2.4 FastThreadLocal (ftl) Implementation
2.4.1 Attributes and Instantiation
private final int index;
public FastThreadLocal() {
index = InternalThreadLocalMap.nextVariableIndex();
}The constructor obtains a unique index from InternalThreadLocalMap.nextVariableIndex() , which simply increments the atomic nextIndex .
2.4.2 get() Method
public final V get() {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); // 1
Object v = threadLocalMap.indexedVariable(index); // 2
if (v != InternalThreadLocalMap.UNSET) {
return (V) v;
}
V value = initialize(threadLocalMap); // 3
registerCleaner(threadLocalMap); // 4
return value;
}1. InternalThreadLocalMap.get() returns the map associated with the current thread, delegating to fastGet() for FastThreadLocalThread or slowGet() otherwise.
public static InternalThreadLocalMap get() {
Thread thread = Thread.currentThread();
if (thread instanceof FastThreadLocalThread) {
return fastGet((FastThreadLocalThread) thread);
} else {
return slowGet();
}
}
private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
InternalThreadLocalMap map = thread.threadLocalMap();
if (map == null) {
thread.setThreadLocalMap(map = new InternalThreadLocalMap());
}
return map;
}
private static InternalThreadLocalMap slowGet() {
ThreadLocal
slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
InternalThreadLocalMap ret = slowThreadLocalMap.get();
if (ret == null) {
ret = new InternalThreadLocalMap();
slowThreadLocalMap.set(ret);
}
return ret;
}2. indexedVariable(index) fetches the value directly from the array.
public Object indexedVariable(int index) {
Object[] lookup = indexedVariables;
return index < lookup.length ? lookup[index] : UNSET;
}3. If the slot contains UNSET , initialize() creates the initial value and stores it.
private V initialize(InternalThreadLocalMap threadLocalMap) {
V v = null;
try {
v = initialValue();
} catch (Exception e) {
PlatformDependent.throwException(e);
}
threadLocalMap.setIndexedVariable(index, v); // 3‑1
addToVariablesToRemove(threadLocalMap, this); // 3‑2
return v;
}4. registerCleaner() would register a cleaner for automatic reclamation, but in Netty 4.1.34 the cleaner registration code is commented out.
private void registerCleaner(final InternalThreadLocalMap threadLocalMap) {
Thread current = Thread.currentThread();
if (FastThreadLocalThread.willCleanupFastThreadLocals(current) || threadLocalMap.isCleanerFlagSet(index)) {
return;
}
threadLocalMap.setCleanerFlag(index);
// Cleaner registration is commented out in this version.
}2.5 Performance Degradation in Ordinary Threads
If a normal thread (not a FastThreadLocalThread) accesses a FastThreadLocal, the call falls back to the JDK ThreadLocal path via slowGet() , which creates a separate ThreadLocal<InternalThreadLocalMap> . Consequently, FastThreadLocal loses its array‑based speed advantage.
3. FastThreadLocal Resource Reclamation Mechanisms
Netty provides three ways to clean up FastThreadLocal values:
Automatic (via FastThreadLocalRunnable): When a task wrapped by FastThreadLocalRunnable finishes, FastThreadLocal values are cleared automatically.
Manual: Users can explicitly call remove() on FastThreadLocal or on the underlying InternalThreadLocalMap (necessary for thread‑pool scenarios).
Cleaner‑based (deprecated in 4.1.34): A Cleaner is registered for each FastThreadLocal; when the thread becomes unreachable, the cleaner releases the value. This approach creates an extra thread and is therefore discouraged.
4. Usage of FastThreadLocal in Netty
The most important use case is allocating ByteBuf objects. Each thread owns a PoolArena ; when a ByteBuf is needed, the thread first tries to allocate from its own arena, falling back to a global arena if necessary. This reduces contention and improves throughput.
final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {
@Override
protected synchronized PoolThreadCache initialValue() {
final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);
Thread current = Thread.currentThread();
if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
return new PoolThreadCache(heapArena, directArena, tinyCacheSize, smallCacheSize,
normalCacheSize, DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
}
return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
}
}By coupling FastThreadLocal with FastThreadLocalThread, Netty achieves low‑overhead thread‑local storage that is especially beneficial for high‑performance networking components.
--- End of article.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.