Backend Development 10 min read

High‑Performance Netty RPC with Protostuff Serialization: Implementation and Stress Testing

This article presents a practical Netty‑based RPC solution using Protostuff serialization, detailing large‑scale stress testing, performance bottlenecks, and code‑level optimizations that enable handling hundreds of thousands of objects per second reliably.

Top Architect
Top Architect
Top Architect
High‑Performance Netty RPC with Protostuff Serialization: Implementation and Stress Testing

In this article the author shares a practical implementation of a high‑throughput RPC system based on Netty and Protostuff serialization, describing the challenges encountered during large‑scale stress testing and the optimizations applied.

During pre‑release testing, 40 client machines each sent 100 k objects per second to two Netty servers, resulting in an average load of 400 k objects per server per second; after code adjustments the system reliably handled over 350 k objects per second without errors.

Protostuff serialization and deserialization

The required Maven dependencies are:

<protostuff.version>1.7.2</protostuff.version>
<dependency>
    <groupId>io.protostuff</groupId>
    <artifactId>protostuff-core</artifactId>
    <version>${protostuff.version}</version>
</dependency>

<dependency>
    <groupId>io.protostuff</groupId>
    <artifactId>protostuff-runtime</artifactId>
    <version>${protostuff.version}</version>
</dependency>

The utility class provides thread‑safe schema caching and uses ProtobufIOUtil for (de)serialization:

public class ProtostuffUtils {
    // schema cache
    private static Map
, Schema
> schemaCache = new ConcurrentHashMap<>();

    public static
byte[] serialize(T obj) {
        Class
clazz = (Class
) obj.getClass();
        Schema
schema = getSchema(clazz);
        LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
        try {
            return ProtobufIOUtil.toByteArray(obj, schema, buffer);
        } finally {
            buffer.clear();
        }
    }

    public static
T deserialize(byte[] data, Class
clazz) {
        Schema
schema = getSchema(clazz);
        T obj = schema.newMessage();
        ProtobufIOUtil.mergeFrom(data, obj, schema);
        return obj;
    }

    @SuppressWarnings("unchecked")
    private static
Schema
getSchema(Class
clazz) {
        Schema
schema = (Schema
) schemaCache.get(clazz);
        if (schema == null) {
            schema = RuntimeSchema.getSchema(clazz);
            if (schema != null) {
                schemaCache.put(clazz, schema);
            }
        }
        return schema;
    }
}

Custom Netty decoder and encoder

The decoder reads the incoming bytes and converts them to HotKeyMsg objects using the utility above:

public class MsgDecoder extends ByteToMessageDecoder {
    @Override
    protected void decode(ChannelHandlerContext ctx, ByteBuf in, List
out) {
        try {
            byte[] body = new byte[in.readableBytes()];
            in.readBytes(body);
            out.add(ProtostuffUtils.deserialize(body, HotKeyMsg.class));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The encoder serializes a HotKeyMsg and appends a custom delimiter before writing to the channel:

public class MsgEncoder extends MessageToByteEncoder
{
    @Override
    public void encode(ChannelHandlerContext ctx, Object msg, ByteBuf out) {
        if (msg instanceof HotKeyMsg) {
            byte[] bytes = ProtostuffUtils.serialize(msg);
            byte[] delimiter = Constant.DELIMITER.getBytes();
            byte[] total = new byte[bytes.length + delimiter.length];
            System.arraycopy(bytes, 0, total, 0, bytes.length);
            System.arraycopy(delimiter, 0, total, bytes.length, delimiter.length);
            out.writeBytes(total);
        }
    }
}

To avoid sticky‑packet problems under high concurrency, a DelimiterBasedFrameDecoder is placed before MsgDecoder , using the same delimiter string on both client and server.

Testing on a single machine and on a cluster (40 clients, 2 servers) shows that with the delimiter and proper buffer handling the system can sustain roughly 400 k objects per second per server without crashes.

In summary, the article demonstrates how to build a robust, high‑performance Netty RPC service with Protostuff, highlights pitfalls such as shared static buffers and sticky packets, and provides complete source code for serialization utilities, decoder, encoder, and deployment diagrams.

BackendJavaperformanceRPCProtostuffSerializationNetty
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.