Backend Development 54 min read

Unlocking Netty’s ByteBuf: A Deep Dive into Java NIO Buffers and Byte Order

This article explores Netty’s low‑level data container ByteBuf by examining Java NIO Buffer design, its core attributes, view mechanisms, heap and direct implementations, byte‑order handling, and how to read and write primitive types, providing a comprehensive foundation for high‑performance network programming.

Bin's Tech Cabin

Aug 11, 2022

Unlocking Netty’s ByteBuf: A Deep Dive into Java NIO Buffers and Byte Order

Let's Come to the Microscopic World to Re‑Understand Netty

In the previous Netty source‑code analysis series "Chatting About Netty Things", the author guided readers through the macro view of Netty, starting from the kernel‑level packet transmission, introducing the IO thread model, and then detailing the implementation of the Reactor model in Netty.

The macro flow includes creating and starting the Reactor model, its architecture, accepting and closing network connections, receiving and sending network data, arranging IO logic with a pipeline, and graceful shutdown of Netty.

Having completed the macro series, the author now dives into the microscopic world to explore Netty from the kernel perspective, aiming to let readers thoroughly understand Netty.

The microscopic series will cover high‑performance components in Netty, including:

The complete design and implementation of Netty's network data container ByteBuf.

The design and implementation of Netty's memory pool, with a walkthrough of related Linux kernel memory‑management source code.

The time‑wheel design used for massive delayed tasks in Netty, compared in detail with Kafka's time‑wheel.

The zero‑copy technique used in Netty and its kernel implementation.

The design and implementation of Netty's MPSC (multiple‑producer single‑consumer) queue and its usage scenarios.

The key component FastThreadLocal that achieves lock‑free concurrency in Netty, with a detailed comparison of its performance against JDK ThreadLocal.

Practical case studies showing how Netty is used in various famous middleware to deepen understanding.

Readers who love details should not miss this series.

Before This Article Starts.....

This is the first part of the microscopic series, focusing on Netty's network data container ByteBuf. ByteBuf has appeared in previous articles such as "How Netty Efficiently Receives Network Data" and "Understanding Netty's Data Sending Process" where it was used for receiving and sending network data.

ByteBuf is Netty's data container. When Netty receives or sends network data, it first caches the data in a ByteBuf before passing it to the pipeline or socket. This prevents the socket receive buffer from being continuously filled, which could cause the TCP window to close and degrade communication speed. Caching data in ByteBuf improves TCP throughput.

When sending, Netty also caches data in ByteBuf so that if the socket send buffer becomes non‑writable, the sending thread does not block; once the buffer is writable again, Netty writes the cached data to the socket. This is the core of Netty's asynchronous sending.

ByteBuf is built on top of JDK NIO's ByteBuffer. The JDK NIO ByteBuffer API is complex and not user‑friendly, so Netty provides a simpler, more convenient API.

The author likes to unfold the technical evolution layer by layer, showing the original form, advantages, bottlenecks, and possible optimizations.

Before introducing ByteBuf's design, the author will first explain JDK NIO Buffer design, its shortcomings, and how Netty optimizes it.

1. JDK NIO Buffer

Before NIO, Java's traditional IO used streams (InputStream and OutputStream) for both network and file IO. These stream operations are blocking; for example, calling InputStream.read() blocks the thread if no data is available.

Traditional streams also process one byte at a time, which is inefficient for network IO and lacks random access within the byte stream.

Therefore, Java 1.4 introduced NIO, which is buffer‑oriented. Data is read from a Channel into a Buffer in bulk, processed, and then written back to a Channel, allowing flexible manipulation of the data within the Buffer.

NIO Buffers also provide off‑heap direct memory and memory‑mapped access to avoid copying between heap and native memory.

Now let's dive into the top‑level abstraction of NIO Buffer.

2. Top‑Level Abstraction of NIO Buffer

In JDK NIO, Buffer is essentially a block of memory, similar to an array. The abstract class java.nio.Buffer defines the core attributes:

capacity : the total number of elements the Buffer can hold.

position : the index of the next element to be read or written.

limit : the upper bound of accessible elements. In write mode, limit equals capacity; in read mode, limit equals the position after the last write.

mark : a saved position used for resetting after partial decoding (e.g., handling TCP packet fragmentation).

The relationship among these fields is mark ≤ position ≤ limit ≤ capacity. The class also provides methods to manipulate these pointers, such as limit(int), position(int), flip(), clear(), rewind(), and compact().

2.1 Buffer Construction

public abstract class Buffer {
    private int mark = -1;
    private int position = 0;
    private int limit;
    private int capacity;
    // ... other fields and methods ...
}

2.2 Core Abstract Operations

2.2.1 Getting the Next Read Index

final int nextGetIndex() {
    if (position >= limit)
        throw new BufferUnderflowException();
    return position++;
}

2.2.2 Getting the Next Read Index with Step

final int nextGetIndex(int nb) {
    if (limit - position < nb)
        throw new BufferUnderflowException();
    int p = position;
    position += nb;
    return p;
}

2.2.3 Getting the Next Write Index

final int nextPutIndex() {
    if (position >= limit)
        throw new BufferOverflowException();
    return position++;
}

2.2.4 Getting the Next Write Index with Step

final int nextPutIndex(int nb) {
    if (limit - position < nb)
        throw new BufferOverflowException();
    int p = position;
    position += nb;
    return p;
}

2.2.5 Switching to Read Mode (flip)

public final Buffer flip() {
    limit = position;
    position = 0;
    mark = -1;
    return this;
}

2.2.6 Switching to Write Mode (clear)

public final Buffer clear() {
    position = 0;
    limit = capacity;
    mark = -1;
    return this;
}

2.2.7 Compacting (compact)

public final Buffer compact() {
    System.arraycopy(hb, ix(position()), hb, ix(0), remaining());
    position(remaining());
    limit(capacity());
    discardMark();
    return this;
}

2.2.8 Rewinding (rewind)

public final Buffer rewind() {
    position = 0;
    mark = -1;
    return this;
}

3. Storage Mechanism Behind NIO Buffer

NIO provides three concrete Buffer types for each primitive: HeapBuffer (backed by a Java array on the heap), DirectBuffer (off‑heap native memory), and MappedBuffer (memory‑mapped file, essentially a DirectBuffer).

HeapBuffer stores data in a byte[] array. DirectBuffer and MappedBuffer store data in off‑heap memory, represented by a native address ( long address).

The author will later detail DirectBuffer and MappedBuffer; this section only introduces the concept.

The method hasArray() indicates whether a Buffer has an accessible backing array.

public abstract boolean hasArray();

If hasArray() returns true, array() returns the backing array.

public abstract Object array();

4. Buffer Views

A Buffer view shares the same underlying memory but has independent position, limit, mark, and capacity. The view is created via slice() (from current position to limit) or duplicate() (full copy of the state).

public abstract ByteBuffer slice();
public abstract ByteBuffer duplicate();

Both methods return a new ByteBuffer that operates on the same storage.

5. Abstract ByteBuffer

ByteBuffer extends Buffer and adds byte‑specific operations. It defines the following fields:

byte[] hb : the backing array for heap buffers.

int offset : the array offset used for creating views.

boolean isReadOnly : indicates if the buffer is read‑only.

public abstract class ByteBuffer extends Buffer implements Comparable<ByteBuffer> {
    final byte[] hb;
    final int offset;
    boolean isReadOnly;
    // ... constructors and methods ...
}

5.1 Creation of Specific Storage ByteBuffer

Factory methods create concrete ByteBuffers:

public static ByteBuffer allocateDirect(int capacity) {
    return new DirectByteBuffer(capacity);
}

public static ByteBuffer allocate(int capacity) {
    if (capacity < 0) throw new IllegalArgumentException();
    return new HeapByteBuffer(capacity, capacity);
}

The wrap method maps a byte array to a ByteBuffer:

public static ByteBuffer wrap(byte[] array, int offset, int length) {
    try {
        return new HeapByteBuffer(array, offset, length);
    } catch (IllegalArgumentException x) {
        throw new IndexOutOfBoundsException();
    }
}

5.2 Define ByteBuffer View‑Related Operations

5.3.1 slice()

public ByteBuffer slice() {
    return new HeapByteBuffer(hb, -1, 0, remaining(), remaining(), position() + offset);
}

5.3.2 duplicate()

public ByteBuffer duplicate() {
    return new HeapByteBuffer(hb, markValue(), position(), limit(), capacity(), offset);
}

5.3.3 asReadOnlyBuffer()

public ByteBuffer asReadOnlyBuffer() {
    return new HeapByteBufferR(hb, markValue(), position(), limit(), capacity(), offset);
}

5.4 Define ByteBuffer Read/Write Operations

Four core byte operations are abstractly defined:

public abstract byte get();
public abstract ByteBuffer put(byte b);
public abstract byte get(int index);
public abstract ByteBuffer put(int index, byte b);

Bulk get/put methods are provided in the abstract class, using loops. Concrete heap implementations override them with System.arraycopy for efficiency.

6. HeapByteBuffer Implementation

HeapByteBuffer stores data in a heap‑allocated byte[] hb. Its constructors allocate the array based on the requested capacity.

class HeapByteBuffer extends ByteBuffer {
    HeapByteBuffer(int cap, int lim) {
        super(-1, 0, lim, cap, new byte[cap], 0);
    }
    // ... other constructors ...
}

6.1 Construction

The constructor creates a new byte[] of the given capacity and passes it to the superclass.

6.2 Reading Bytes

6.2.1 Read a byte at the current position

public byte get() {
    return hb[ix(nextGetIndex())];
}

protected int ix(int i) { return i + offset; }

6.2.2 Read a byte at a given index

public byte get(int i) {
    return hb[ix(checkIndex(i))];
}

6.2.3 Bulk get into a destination array

public ByteBuffer get(byte[] dst, int offset, int length) {
    checkBounds(offset, length, dst.length);
    if (length > remaining())
        throw new BufferUnderflowException();
    System.arraycopy(hb, ix(position()), dst, offset, length);
    position(position() + length);
    return this;
}

6.3 Writing Bytes

6.3.1 Write a byte at the current position

public ByteBuffer put(byte x) {
    hb[ix(nextPutIndex())] = x;
    return this;
}

6.3.2 Write a byte at a given index

public ByteBuffer put(int i, byte x) {
    hb[ix(checkIndex(i))] = x;
    return this;
}

6.3.3 Bulk put from a source array

public ByteBuffer put(byte[] src, int offset, int length) {
    checkBounds(offset, length, src.length);
    if (length > remaining())
        throw new BufferOverflowException();
    System.arraycopy(src, offset, hb, ix(position()), length);
    position(position() + length);
    return this;
}

7. Byte Order

The author uses an egg‑peeling analogy to explain big‑endian (high‑byte first) and little‑endian (low‑byte first) ordering. Network protocols use big‑endian order.

In memory, the heap grows from low to high addresses, while the stack grows from high to low addresses.

7.1 Big‑Endian

In big‑endian order, the most significant byte is stored at the lowest address.

7.2 Little‑Endian

In little‑endian order, the least significant byte is stored at the lowest address.

8. Writing Primitive Types to HeapByteBuffer

ByteBuffer defaults to big‑endian order but can be switched via order(ByteOrder). The underlying implementation uses the Bits utility class to handle endianness.

public final ByteBuffer order(ByteOrder bo) {
    bigEndian = (bo == ByteOrder.BIG_ENDIAN);
    nativeByteOrder = (bigEndian == (Bits.byteOrder() == ByteOrder.BIG_ENDIAN));
    return this;
}

8.1 Big‑Endian

public ByteBuffer putInt(int x) {
    Bits.putInt(this, ix(nextPutIndex(4)), x, bigEndian);
    return this;
}

The Bits.putInt method writes the four bytes of the integer according to the selected endianness.

static void putInt(ByteBuffer bb, int bi, int x, boolean bigEndian) {
    if (bigEndian) putIntB(bb, bi, x); else putIntL(bb, bi, x);
}

static void putIntB(ByteBuffer bb, int bi, int x) {
    bb._put(bi, int3(x));
    bb._put(bi+1, int2(x));
    bb._put(bi+2, int1(x));
    bb._put(bi+3, int0(x));
}

8.1.1 int3(x) – highest byte

private static byte int3(int x) { return (byte)(x >> 24); }

8.1.2 int2(x) – second highest byte

private static byte int2(int x) { return (byte)(x >> 16); }

8.1.3 int1(x) – third byte

private static byte int1(int x) { return (byte)(x >> 8); }

8.1.4 int0(x) – lowest byte

private static byte int0(int x) { return (byte)(x); }

After writing the integer 5674 in big‑endian order, the underlying byte array looks like:

8.2 Little‑Endian

static void putIntL(ByteBuffer bb, int bi, int x) {
    bb._put(bi+3, int3(x));
    bb._put(bi+2, int2(x));
    bb._put(bi+1, int1(x));
    bb._put(bi,   int0(x));
}

In little‑endian order the most significant byte is stored at the highest address.

9. Reading Primitive Types from HeapByteBuffer

Reading mirrors the writing process. The abstract ByteBuffer.getInt() delegates to Bits.getInt.

public int getInt() {
    return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian);
}

9.1 Big‑Endian

static int getIntB(ByteBuffer bb, int bi) {
    return makeInt(bb._get(bi), bb._get(bi+1), bb._get(bi+2), bb._get(bi+3));
}

private static int makeInt(byte b3, byte b2, byte b1, byte b0) {
    return ((b3) << 24) |
           ((b2 & 0xff) << 16) |
           ((b1 & 0xff) << 8) |
           (b0 & 0xff);
}

9.2 Little‑Endian

static int getIntL(ByteBuffer bb, int bi) {
    return makeInt(bb._get(bi+3), bb._get(bi+2), bb._get(bi+1), bb._get(bi));
}

10. Converting HeapByteBuffer to Typed Buffers

ByteBuffer can be converted to any primitive‑typed buffer. For example, converting to an IntBuffer:

public IntBuffer asIntBuffer() {
    int size = this.remaining() >> 2;
    int off = offset + position();
    return bigEndian ?
        (IntBuffer)(new ByteBufferAsIntBufferB(this, -1, 0, size, size, off)) :
        (IntBuffer)(new ByteBufferAsIntBufferL(this, -1, 0, size, size, off));
}

The concrete classes ByteBufferAsIntBufferB and ByteBufferAsIntBufferL delegate reads and writes to the underlying ByteBuffer using the appropriate byte order.

class ByteBufferAsIntBufferB extends IntBuffer {
    protected final ByteBuffer bb;
    public int get() { return Bits.getIntB(bb, ix(nextGetIndex())); }
    // ... other methods ...
}

Summary

This article used the simplest NIO implementation, HeapByteBuffer, to walk through the overall design of NIO Buffers, from the top‑level abstract concepts to concrete implementations, including view creation, byte‑order handling, and primitive read/write operations.

Understanding these mechanisms is essential for high‑performance network programming with Netty, as many of Netty's core components (such as ByteBuf, memory pools, time wheels, and FastThreadLocal) rely on these Buffer concepts.

Future articles will dive into DirectByteBuffer and MappedByteBuffer, which involve off‑heap memory and memory‑mapped files, respectively, to give a complete picture of NIO Buffer internals.

Netty Java NIO buffer Network IO ByteBuf Byte Order HeapByteBuffer

Written by

Bin's Tech Cabin

Original articles dissecting source code and sharing personal tech insights. A modest space for serious discussion, free from noise and bureaucracy.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.