Backend Development 15 min read

Enhancing a Java RPC Framework: Protobuf/Kryo Serialization, Load‑Balancing Strategies, Connection Pooling and Performance Gains

This article describes a series of improvements to a Spring‑Boot based Java RPC framework—including support for Protobuf and Kryo serialization, multiple load‑balancing algorithms, client‑side service caching, TCP long‑connection reuse via Netty, and a performance test that reduces request latency by more than tenfold.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
Enhancing a Java RPC Framework: Protobuf/Kryo Serialization, Load‑Balancing Strategies, Connection Pooling and Performance Gains

1. Introduction

Inspired by a previous tutorial on building an RPC framework from scratch, the author rewrote the project and added several optimizations.

2. Main Changes

Added Protobuf and Kryo serialization protocols alongside the original Java serialization.

Implemented multiple load‑balancing algorithms (random, round‑robin, weighted round‑robin, smooth weighted round‑robin) configurable via properties.

Introduced a local service‑list cache on the client to improve performance.

Fixed a Netty‑related memory leak that occurred under high concurrency.

Switched from per‑request connections to persistent TCP long connections with reuse.

Added a server‑side thread pool to increase message‑handling capacity.

3. RPC Concepts

RPC (Remote Procedure Call) enables a client to invoke methods on a remote server as if they were local. The framework defines four roles: service consumer, service provider, registry (e.g., Zookeeper or Redis), and optional monitoring.

4. Implementation Overview

The rpc-spring-boot-starter uses Zookeeper for service registration, Netty for network communication, and Spring for configuration. Message encoding supports Protostuff, Kryo, and Java. Load‑balancing implementations are discovered via SPI.

4.1 Dynamic Load‑Balancing

Load‑balancing classes implement the LoadBalance interface and are annotated with a custom @LoadBalanceAno . Example implementation of a round‑robin algorithm is shown below:

@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface LoadBalanceAno {
    String value() default "";
}

@LoadBalanceAno(RpcConstant.BALANCE_ROUND)
public class FullRoundBalance implements LoadBalance {
    private static final Logger logger = LoggerFactory.getLogger(FullRoundBalance.class);
    private volatile int index;
    @Override
    public synchronized Service chooseOne(List<Service> services) {
        if (index == services.size()) {
            index = 0;
        }
        return services.get(index++);
    }
}

The configuration class RpcConfig now contains a loadBalance property that defaults to random .

@ConfigurationProperties(prefix = "sp.rpc")
public class RpcConfig {
    private String registerAddress = "127.0.0.1:2181";
    private Integer serverPort = 9999;
    private String protocol = "java";
    private String loadBalance = "random";
    private Integer weight = 1;
    // getters and setters omitted
}

The auto‑configuration class selects the appropriate load‑balancer via ServiceLoader based on this property.

4.2 Local Service Cache

A thread‑safe ServerDiscoveryCache stores service lists in a ConcurrentHashMap . The client first checks the cache before querying Zookeeper, and a Zookeeper child‑node listener clears the cache when service instances change.

public class ServerDiscoveryCache {
    private static final Map
> SERVER_MAP = new ConcurrentHashMap<>();
    private static final List
SERVICE_CLASS_NAMES = new ArrayList<>();
    public static void put(String serviceName, List
list) { SERVER_MAP.put(serviceName, list); }
    public static List
get(String serviceName) { return SERVER_MAP.get(serviceName); }
    public static void removeAll(String serviceName) { SERVER_MAP.remove(serviceName); }
    public static boolean isEmpty(String serviceName) {
        List
list = SERVER_MAP.get(serviceName);
        return list == null || list.isEmpty();
    }
}

4.3 Netty Client with TCP Long Connection

The NettyNetClient maintains a static map connectedServerNodes that caches SendHandlerV2 instances per ip:port . When a request arrives, the client reuses an existing channel if present; otherwise it creates a new connection and stores the handler.

public class NettyNetClient implements NetClient {
    public static Map
connectedServerNodes = new ConcurrentHashMap<>();
    @Override
    public RpcResponse sendRequest(RpcRequest rpcRequest, Service service, MessageProtocol protocol) {
        String address = service.getAddress();
        synchronized (address) {
            if (connectedServerNodes.containsKey(address)) {
                return connectedServerNodes.get(address).sendRequest(rpcRequest);
            }
            // create new Netty channel, register handler, store in map
            // ... (bootstrap code omitted for brevity)
            return handler.sendRequest(rpcRequest);
        }
    }
}

The SendHandlerV2 uses a CountDownLatch to wait for channel registration, a custom RpcFuture map to correlate requests and responses, and cleans up the cache when the channel becomes inactive.

public class SendHandlerV2 extends ChannelInboundHandlerAdapter {
    private static final Map
> requestMap = new ConcurrentHashMap<>();
    private volatile Channel channel;
    private final CountDownLatch latch = new CountDownLatch(1);
    @Override
    public void channelRegistered(ChannelHandlerContext ctx) { this.channel = ctx.channel(); latch.countDown(); }
    @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) {
        ByteBuf buf = (ByteBuf) msg;
        byte[] resp = new byte[buf.readableBytes()];
        buf.readBytes(resp);
        ReferenceCountUtil.release(buf);
        RpcResponse response = messageProtocol.unmarshallingResponse(resp);
        RpcFuture
future = requestMap.remove(response.getRequestId());
        if (future != null) future.setResponse(response);
    }
    public RpcResponse sendRequest(RpcRequest request) throws Exception {
        RpcFuture
future = new RpcFuture<>();
        requestMap.put(request.getRequestId(), future);
        if (!latch.await(CHANNEL_WAIT_TIME, TimeUnit.SECONDS)) {
            throw new RpcException("establish channel timeout");
        }
        byte[] data = messageProtocol.marshallingRequest(request);
        channel.writeAndFlush(Unpooled.wrappedBuffer(data));
        return future.get(RESPONSE_WAIT_TIME, TimeUnit.SECONDS);
    }
}

5. Performance Test

The author performed an ApacheBench test with 4 concurrent threads sending 10,000 requests to a service running on localhost. The improved framework completed the test in 11 seconds, a ten‑fold reduction compared with the original 130‑second execution time.

Source code is available at:

rpc-spring-boot-starter

rpc-example

JavaPerformanceRPCLoad BalancingZookeepernettySpring Boot
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.