Backend Development 10 min read

High‑Performance Netty RPC with Protostuff Serialization: Implementation and Stress Testing

This article presents a practical Netty‑based RPC solution using Protostuff serialization, detailing large‑scale stress testing, performance bottlenecks, and code‑level optimizations that enable handling hundreds of thousands of objects per second reliably.

Top Architect

Jan 30, 2021

High‑Performance Netty RPC with Protostuff Serialization: Implementation and Stress Testing

In this article the author shares a practical implementation of a high‑throughput RPC system based on Netty and Protostuff serialization, describing the challenges encountered during large‑scale stress testing and the optimizations applied.

During pre‑release testing, 40 client machines each sent 100 k objects per second to two Netty servers, resulting in an average load of 400 k objects per server per second; after code adjustments the system reliably handled over 350 k objects per second without errors.

Protostuff serialization and deserialization

The required Maven dependencies are:

<protostuff.version>1.7.2</protostuff.version>
<dependency>
    <groupId>io.protostuff</groupId>
    <artifactId>protostuff-core</artifactId>
    <version>${protostuff.version}</version>
</dependency>

<dependency>
    <groupId>io.protostuff</groupId>
    <artifactId>protostuff-runtime</artifactId>
    <version>${protostuff.version}</version>
</dependency>

The utility class provides thread‑safe schema caching and uses ProtobufIOUtil for (de)serialization:

public class ProtostuffUtils {
    // schema cache
    private static Map<Class<?>, Schema<?>> schemaCache = new ConcurrentHashMap<>();

    public static <T> byte[] serialize(T obj) {
        Class<T> clazz = (Class<T>) obj.getClass();
        Schema<T> schema = getSchema(clazz);
        LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
        try {
            return ProtobufIOUtil.toByteArray(obj, schema, buffer);
        } finally {
            buffer.clear();
        }
    }

    public static <T> T deserialize(byte[] data, Class<T> clazz) {
        Schema<T> schema = getSchema(clazz);
        T obj = schema.newMessage();
        ProtobufIOUtil.mergeFrom(data, obj, schema);
        return obj;
    }

    @SuppressWarnings("unchecked")
    private static <T> Schema<T> getSchema(Class<T> clazz) {
        Schema<T> schema = (Schema<T>) schemaCache.get(clazz);
        if (schema == null) {
            schema = RuntimeSchema.getSchema(clazz);
            if (schema != null) {
                schemaCache.put(clazz, schema);
            }
        }
        return schema;
    }
}

Custom Netty decoder and encoder

The decoder reads the incoming bytes and converts them to HotKeyMsg objects using the utility above:

public class MsgDecoder extends ByteToMessageDecoder {
    @Override
    protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) {
        try {
            byte[] body = new byte[in.readableBytes()];
            in.readBytes(body);
            out.add(ProtostuffUtils.deserialize(body, HotKeyMsg.class));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The encoder serializes a HotKeyMsg and appends a custom delimiter before writing to the channel:

public class MsgEncoder extends MessageToByteEncoder<Object> {
    @Override
    public void encode(ChannelHandlerContext ctx, Object msg, ByteBuf out) {
        if (msg instanceof HotKeyMsg) {
            byte[] bytes = ProtostuffUtils.serialize(msg);
            byte[] delimiter = Constant.DELIMITER.getBytes();
            byte[] total = new byte[bytes.length + delimiter.length];
            System.arraycopy(bytes, 0, total, 0, bytes.length);
            System.arraycopy(delimiter, 0, total, bytes.length, delimiter.length);
            out.writeBytes(total);
        }
    }
}

To avoid sticky‑packet problems under high concurrency, a DelimiterBasedFrameDecoder is placed before MsgDecoder, using the same delimiter string on both client and server.

Testing on a single machine and on a cluster (40 clients, 2 servers) shows that with the delimiter and proper buffer handling the system can sustain roughly 400 k objects per second per server without crashes.

In summary, the article demonstrates how to build a robust, high‑performance Netty RPC service with Protostuff, highlights pitfalls such as shared static buffers and sticky packets, and provides complete source code for serialization utilities, decoder, encoder, and deployment diagrams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java performance RPC Protostuff serialization Netty

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.