Understanding the Actor Model and Akka in Big Data RPC Systems
This article introduces the Actor model, its fundamental rules, and how it underpins Flink and Spark RPC mechanisms, then explains the Akka framework, its actor hierarchy, supervision, lifecycle, and dispatcher, providing a concise foundation for distributed big‑data processing.
Preface
Recently, in my spare time I examined Flink's RPC infrastructure in depth and compared it with the Spark RPC mechanism I had previously analyzed. I found the Actor model remarkably elegant and decided to record a brief introduction that can serve as a basic guide for understanding Flink's RPC later.
Actor Model
The Actor model was proposed in 1973 by Hewitt, Bishop, and Steiger in the paper "A Universal Modular Actor Formalism for Artificial Intelligence". It is an innovative concurrency, distributed computing, and programming model that treats every entity as an Actor, which must obey several basic rules.
All computation is performed inside an Actor.
Actors can only communicate via messages, and messages are immutable.
Actors process messages sequentially. When an Actor handles a message, it may: <code>* Change its state or behavior;<br/>* Send a limited number of messages to other Actors;<br/>* Create a limited number of child Actors.</code>
The term "Actor" still lacks a universally accepted Chinese translation; some render it as "角色", which is roughly appropriate.
A simple system that conforms to the Actor model is shown below. An Actor essentially consists of three elements: state, behavior, and mailbox.
State : Variables and data maintained internally by the Actor. Each Actor keeps its own state isolated from others.
Behavior : A set of computation logic (e.g., functions) defined inside the Actor to process received messages and modify its state.
Mailbox : A FIFO queue associated with the receiving Actor. Because an Actor processes messages one at a time, messages that arrive while it is busy are stored in the mailbox and later retrieved sequentially. An Actor can be both sender and receiver.
The Actor model solves the most challenging problem in concurrent environments—shared data—by eliminating shared mutable state. Traditional approaches rely on synchronization mechanisms (locks, semaphores, atomic operations) that incur high overhead and can cause deadlocks. In contrast, the Actor model uses asynchronous, non‑blocking message passing, keeping state isolated and eliminating the need for explicit synchronization, which leads to simpler and more efficient designs.
Because an Actor runs on a single thread, it is very lightweight; a 1 GB memory space can host millions of Actor instances.
Many mature implementations of the Actor model exist; for example, Erlang's concurrency model is built entirely on it. One of the most active open‑source projects based on the Actor model is Akka, which also serves as the foundation of Flink's RPC system. Earlier versions of Spark used Akka for RPC as well; newer versions switched to Netty but still retain design ideas reminiscent of a simplified Akka.
Akka Actor System
The Akka homepage describes it as high‑concurrency, distributed, resilient, message‑driven, and JVM‑based—five keywords that capture the essence of the Actor model.
The Akka ecosystem consists of many libraries (modules) such as Actor, Remoting, Cluster, Persistence, Streams, HTTP, etc. The Actor library is the core of Akka, and the following summarizes its fundamental concepts.
Akka Actors are organized in a tree‑like hierarchy, as illustrated below.
Akka manages all Actors through an ActorSystem; each JVM instance contains a single ActorSystem. When an ActorSystem starts, three guardian Actors are created by default:
/ – the root guardian, analogous to the root of a file system; it is created first and destroyed last.
/system – the system guardian; Akka itself and modules built on Akka create child Actors under this path.
/user – the user guardian; all Actors you create in your application reside under this path. Calling ActorSystem.actorOf() creates an Actor directly under /user , while calling ActorContext.actorOf() creates a child Actor under the current Actor.
When you create or look up an Actor, the returned handle is an ActorRef, an immutable, serializable reference to the Actor. You use an ActorRef to send messages to the Actor.
Akka defines a three‑layer abstraction from low‑level to high‑level: Actor, ActorContext, and ActorRef, together with the corresponding ActorPath. The diagram below shows this relationship.
The actual hierarchical relationship of an Actor is maintained inside its ActorContext (which also contains the ActorRef of the sender). The path information is stored in the ActorRef. This design enables Actors belonging to different ActorSystems to communicate correctly.
The hierarchy also underpins the supervision mechanism. When an Actor fails, it notifies its parent, which can decide to resume, restart, stop, or propagate the failure upward. The following diagram shows a complete Akka Actor lifecycle.
Akka provides lifecycle hook methods that users can override to manage an Actor's start, stop, and restart behavior. Notably, when a parent Actor stops, all its child Actors are stopped as well.
The entire ActorSystem is driven by a central component called the Dispatcher, which schedules messages from an Actor's mailbox onto threads for execution. The principle is straightforward and is illustrated in the diagram below.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
