Big Data 20 min read

Deep Dive into Flink's RPC Framework Implemented with Akka

This article explains how Apache Flink builds its RPC communication layer on top of Akka by detailing the Actor model, actor system creation, message passing patterns, key RPC interfaces such as RpcGateway and RpcEndpoint, and the internal workflow of request handling and execution.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Deep Dive into Flink's RPC Framework Implemented with Akka

Flink's internal RPC mechanism relies on Akka, an actor‑based framework that provides asynchronous message passing and fault‑tolerant concurrency. The article first introduces Akka’s core concepts, including actors, mailboxes, and the actor hierarchy.

To create an Akka system, developers must instantiate an ActorSystem and then create actors via actorOf. A typical snippet looks like:

ActorSystem system = ActorSystem.create("sys");
ActorRef helloActor = system.actorOf(Props.create(HelloActor.class), "helloActor");
helloActor.tell("hello helloActor", ActorRef.noSender());
system.terminate();

Each actor has a unique path, e.g., akka://sys/user/helloActor for local actors and akka.tcp://[email protected]:2020/user/remoteActor for remote ones. The path components (system name, user or system, and actor name) are explained.

Communication patterns include:

tell : fire‑and‑forget asynchronous send.

ask : request‑reply pattern that returns a Future wrapped in a scala.concurrent.Future.

Flink defines its own RPC contracts. The RpcGateway interface declares methods to obtain the address and hostname of an endpoint:

public interface RpcGateway {
    String getAddress();
    String getHostname();
}

The RpcEndpoint class implements RpcGateway and holds a reference to the underlying RpcService. Its constructor registers the endpoint with the service and starts an RpcServer:

protected RpcEndpoint(final RpcService rpcService, final String endpointId) {
    this.rpcService = checkNotNull(rpcService, "rpcService");
    this.endpointId = checkNotNull(endpointId, "endpointId");
    this.rpcServer = rpcService.startServer(this);
    this.mainThreadExecutor = new MainThreadExecutor(rpcServer, this::validateRunsInMainThread);
}

The RpcService interface provides methods to start and stop RPC servers, schedule tasks, and obtain RpcGateway instances. Flink’s concrete implementation is AkkaRpcService, which encapsulates an ActorSystem and maps RpcEndpoint objects to their corresponding ActorRef. When starting a server, it creates either an AkkaRpcActor or a FencedAkkaRpcActor depending on the endpoint type:

public <C extends RpcEndpoint & RpcGateway> RpcServer startServer(C rpcEndpoint) {
    checkNotNull(rpcEndpoint, "rpc endpoint");
    Props akkaRpcActorProps;
    if (rpcEndpoint instanceof FencedRpcEndpoint) {
        akkaRpcActorProps = Props.create(FencedAkkaRpcActor.class, rpcEndpoint, ...);
    } else {
        akkaRpcActorProps = Props.create(AkkaRpcActor.class, rpcEndpoint, ...);
    }
    ActorRef actorRef = actorSystem.actorOf(akkaRpcActorProps, rpcEndpoint.getEndpointId());
    actors.put(actorRef, rpcEndpoint);
    // create dynamic proxy for RPC calls
    RpcServer server = (RpcServer) Proxy.newProxyInstance(...);
    return server;
}

When a client invokes a remote method, the AkkaInvocationHandler builds an RpcInvocation message. If the method returns void, it uses tell; otherwise it uses ask and processes the Future result, handling serialization of SerializedValue objects.

private Object invokeRpc(Method method, Object[] args) throws Exception {
    String methodName = method.getName();
    RpcInvocation rpcInvocation = createRpcInvocationMessage(methodName, method.getParameterTypes(), args);
    if (method.getReturnType().equals(Void.TYPE)) {
        tell(rpcInvocation);
        return null;
    } else {
        CompletableFuture<?> resultFuture = ask(rpcInvocation, timeout);
        // deserialize if needed
        return resultFuture.get(...);
    }
}

The receiving AkkaRpcActor dispatches messages based on their type ( RunAsync, CallAsync, RpcInvocation) and either executes code directly on the main thread or forwards the call to the appropriate method on the endpoint. Errors are wrapped in Status.Failure and sent back to the caller.

Overall, Flink’s RPC layer abstracts Akka’s actor system behind clean Java interfaces, enabling both local and remote procedure calls, scheduled execution, and safe concurrency without exposing low‑level actor details to the rest of the framework.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig DataFlinkRPCAkkaactor-model
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.