Big Data 17 min read

Analysis of Hadoop RPC Architecture and Implementation

The article examines Hadoop’s RPC framework—detailing its client‑server workflow, core classes (RPC, Client, Server), dynamic proxy handling, NIO‑based server threading, configurable concurrency controls such as FairCallQueue, and a practical HDFS mkdir command example, illustrating high‑performance distributed communication.

Didi Tech

Jun 22, 2019

Analysis of Hadoop RPC Architecture and Implementation

Hadoop Distributed File System (HDFS) relies on remote procedure calls (RPC) to coordinate DataNode, NameNode and client interactions. To reduce coupling, HDFS abstracts inter‑node communication into two interface families: HadoopRPC and stream‑based TCP/HTTP interfaces. This article provides an in‑depth analysis of HadoopRPC based on Hadoop 2.7 source code.

1. RPC Working Principle

RPC enables a client program to invoke methods on a remote server as if they were local calls, typically using TCP or UDP for transport. The RPC framework follows a client/server model where the client sends a request, the server processes it, and a response is returned.

The workflow includes:

client functions – the caller invokes a client stub

client stub – serializes arguments and sends them via sockets

sockets – network layer (TCP/UDP)

server stub – receives, deserializes, and forwards to server functions

server functions – execute business logic and return results

2. HadoopRPC Architecture Design

HadoopRPC implements the above workflow using three core classes in org.apache.hadoop.ipc: RPC, Client and Server. Communication is built on TCP/IP sockets.

public static ProtocolProxy getProxy(...){ ... }</code>
<code>public static Server RPC.Builder(Configuration).build(){ ... }

Typical usage steps:

Define an RPC protocol (e.g., ClientProtocol for NameNode communication).

Implement the protocol on the server side.

Construct and start an RPC.Server instance.

Create an RPC.getProxy on the client and invoke methods.

3. RPC Client Decoding

The client obtains a proxy via RPC.getProxy, which internally calls RPC.getProtocolProxy. The proxy uses a dynamic InvocationHandler (e.g., WritableRpcEngine.Invoker) that serializes the method call into a WritableRpcEngine.Invocation and forwards it to the server.

public static <span><T> ProtocolProxy<T> getProtocolProxy(...){
    return getProtocolEngine(protocol, conf).getProxy(...);
}

The Client.call method sends the request, waits for a response, and handles errors or remote exceptions.

public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId, int serviceClass, AtomicBoolean fallbackToSimpleAuth) throws IOException {
    // create Call, obtain Connection, send request, wait for response
}

4. RPC Server Decoding

The server runs a single listener thread that accepts connections, then uses a pool of reader, handler, and responder threads (NIO Reactor pattern). Incoming data is read by a Reader, placed into a shared CallQueue, processed by a Handler, and the response is written back by a Responder.

5. Concurrency Optimizations

HadoopRPC provides configuration parameters to tune thread pools and queue sizes, e.g.: ipc.server.read.threadpool.size – number of reader threads (default 1). dfs.namenode.service.handler.count – number of handler threads (default 10). ipc.server.handler.queue.size – max queue length per handler.

To prevent a single user from monopolizing the queue, HadoopRPC offers FairCallQueue, which assigns priorities based on request frequency and uses weighted round‑robin scheduling.

/**
 * Determines which queue to start reading from, occasionally drawing from low‑priority queues
 * to prevent starvation. Example weights: 9,4,1 for three queues.
 */
public class WeightedRoundRobinMultiplexer implements RpcMultiplexer { ... }

6. Command‑Level Example

The article also shows how a typical HDFS command ( hadoop fs -mkdir /user/test) is parsed: the shell script resolves to org.apache.hadoop.fs.FsShell, which ultimately calls ClientProtocol.mkdirs via the HadoopRPC proxy.

public boolean primitiveMkdir(String src, FsPermission absPermission, boolean createParent){
    return namenode.mkdirs(src, absPermission, createParent);
}

@Override
public boolean mkdirs(String src, FsPermission masked, boolean createParent){
    return rpcProxy.mkdirs(null, req).getResult();
}

Conclusion

HadoopRPC is a high‑performance RPC framework that combines dynamic proxies, Java NIO, and flexible serialization (Writable, Protobuf). Its design patterns and concurrency mechanisms are valuable references for developers building distributed systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Big Data RPC serialization NIO Hadoop

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.