Scaling Netty RPC with Protostuff: High‑Throughput Serialization Tips
This article details how to build a high‑performance Netty RPC system using Protostuff for serialization, covering dependency setup, custom encoder/decoder implementation, multithreaded buffer handling, and stress‑test results that achieve hundreds of thousands of objects per second per server.
Many Netty + Protostuff RPC demos are available online, but most are simple copies that work only in single‑threaded tests. After extensive modifications and stress testing, the author achieved stable processing of over 350,000 objects per second on a single machine and around 400,000 objects per second per server in a multi‑machine setup.
In a pre‑release environment, the author deployed 40 client machines sending 20,000 objects every 100 ms to two Netty servers, resulting in an average of 400,000 objects received per server per second. Business logic could handle about 350,000 objects per second, so the networking layer needed to be robust.
Protostuff Serialization and Deserialization
Adding Protostuff is straightforward by including the following Maven dependencies:
<protostuff.version>1.7.2</protostuff.version>
<dependency>
<groupId>io.protostuff</groupId>
<artifactId>protostuff-core</artifactId>
<version>${protostuff.version}</version>
</dependency>
<dependency>
<groupId>io.protostuff</groupId>
<artifactId>protostuff-runtime</artifactId>
<version>${protostuff.version}</version>
</dependency>The utility class below provides reusable serialize/deserialize methods and caches schemas to avoid repeated reflection costs:
public class ProtostuffUtils {
/** Avoid allocating a new buffer for each serialization. */
// private static LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
/** Cache schemas */
private static Map<Class<?>, Schema<?>> schemaCache = new ConcurrentHashMap<>();
@SuppressWarnings("unchecked")
public static <T> byte[] serialize(T obj) {
Class<T> clazz = (Class<T>) obj.getClass();
Schema<T> schema = getSchema(clazz);
LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
byte[] data;
try {
data = ProtobufIOUtil.toByteArray(obj, schema, buffer);
} finally {
buffer.clear();
}
return data;
}
public static <T> T deserialize(byte[] data, Class<T> clazz) {
Schema<T> schema = getSchema(clazz);
T obj = schema.newMessage();
ProtobufIOUtil.mergeFrom(data, obj, schema);
return obj;
}
@SuppressWarnings("unchecked")
private static <T> Schema<T> getSchema(Class<T> clazz) {
Schema<T> schema = (Schema<T>) schemaCache.get(clazz);
if (schema == null) {
schema = RuntimeSchema.getSchema(clazz);
if (schema != null) {
schemaCache.put(clazz, schema);
}
}
return schema;
}
}A common pitfall is using a static buffer in multithreaded scenarios; the same buffer may be reused before it is cleared, causing exceptions. The performance gain from reusing the buffer is negligible compared to the risk.
Custom Serialization Method
The decoder reads the raw bytes from Netty, then uses ProtostuffUtils.deserialize to obtain a HotKeyMsg instance:
public class MsgDecoder extends ByteToMessageDecoder {
@Override
protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) {
try {
byte[] body = new byte[in.readableBytes()];
in.readBytes(body);
out.add(ProtostuffUtils.deserialize(body, HotKeyMsg.class));
} catch (Exception e) {
e.printStackTrace();
}
}
}The encoder serializes the object and appends a custom delimiter defined in Constant.DELIMITER:
public class MsgEncoder extends MessageToByteEncoder {
@Override
public void encode(ChannelHandlerContext ctx, Object in, ByteBuf out) {
if (in instanceof HotKeyMsg) {
byte[] bytes = ProtostuffUtils.serialize(in);
byte[] delimiter = Constant.DELIMITER.getBytes();
byte[] total = new byte[bytes.length + delimiter.length];
System.arraycopy(bytes, 0, total, 0, bytes.length);
System.arraycopy(delimiter, 0, total, bytes.length, delimiter.length);
out.writeBytes(total);
}
}
}To avoid sticky‑packet problems when many objects are sent rapidly, a DelimiterBasedFrameDecoder is placed before MsgDecoder. The delimiter separates each serialized object, ensuring the decoder receives a complete byte stream for each message.
Server and client pipelines consist of the delimiter decoder, MsgDecoder, MsgEncoder, and the business handler. The same pipeline is used on both sides because the same HotKeyMsg definition is shared.
Stress‑test results show that with 40 client instances and 2 server instances, each server reliably processes about 400,000 objects per second without errors, provided the delimiter decoder is used and sending is either spaced (e.g., 100 ms) or synchronized with sync() to prevent packet collisions.
·END·
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
