Aloha: A Scala‑Based Distributed Task Scheduling Framework – Overview, Extensions, and Architecture

Aloha is a Scala‑implemented distributed task scheduling and management framework built on Spark that provides plug‑in extensions, high‑availability Master‑Worker architecture, custom event listeners, and a lightweight Scala‑based RPC layer for managing long‑running jobs such as Spark, Flink, and ETL tasks.

Top Architect
Top Architect
Top Architect
Aloha: A Scala‑Based Distributed Task Scheduling Framework – Overview, Extensions, and Architecture

Overview of Aloha, a Scala‑based distributed task scheduling and management framework that provides a plug‑in extension mechanism and can schedule various types of jobs such as Spark, Flink and ETL tasks.

Core implementation builds on Spark’s scheduler, modifies the Master‑Worker components, offers high‑availability configuration and a REST interface for submitting applications.

Extension – Different Application Types

Applications are represented by the trait Application interface. Implementations need only provide start(), shutdown(reason), description, application directory and configuration methods. The start() method returns a Promise[ExitState] to support long‑running jobs.

trait Application {
  //启动
  def start(): Promise[ExitState]
  //强制停止
  def shutdown(reason: Option[String]): Unit
  //提交应用时的描述
  def withDescription(desc: ApplicationDescription): Application
  //应用运行时的工作目录
  def withApplicationDir(appDir: File): Application
  //系统配置
  def withAlohaConf(conf: AlohaConf): Application
  //应用运行结束后的清理动作
  def clean(): Unit
}

Example of launching an external process inside start() and handling shutdown:

override def start(): Promise[ExitState] = {
  //启动进程
  val processBuilder = getProcessBuilder()
  process = processBuilder.start()
  stateMonitorThread = new Thread("app-state-monitor-thread") {
    override def run(): Unit = {
      val exitCode = process.waitFor()
      //进程退出
      if(exitCode == 0) {
        result.success(ExitState(ExitCode.SUCCESS, Some("success")))
      } else {
        result.success(ExitState(ExitCode.FAILED, Some("failed")))
      }
    }
  }
  stateMonitorThread.start()
  result
}

override def shutdown(reason: Option[String]): Unit = {
  if (process != null) {
    //强制结束进程
    val exitCode = Utils.terminateProcess(process, APP_TERMINATE_TIMEOUT_MS)
    if (exitCode.isEmpty) {
      logWarning("Failed to terminate process: " + process + ". This process will likely be orphaned.")
    }
  }
}

Custom Event Listener

Users can implement trait AlohaEventListener to receive application state change events, relaunch events, or other custom events. Multiple listeners can be registered dynamically when Aloha starts.

trait AlohaEventListener {
  def onApplicationStateChange(event: AppStateChangedEvent): Unit
  def onApplicationRelaunched(event: AppRelaunchedEvent): Unit
  def onOtherEvent(event: AlohaEvent): Unit
}

Module Design – Overall Architecture

Aloha follows a master‑worker architecture similar to Spark. The Master manages workers, receives application submissions via REST or RPC, and schedules jobs using a FIFO policy based on available resources. Workers register with the Master, send heartbeats, and execute applications in isolated class loaders.

High‑availability is achieved by running multiple Master instances with a LeaderElectionAgent; only the “Alive” Master handles scheduling while “Standby” masters stay passive. State is persisted in a pluggable engine (FileSystem or ZooKeeper) and recovered on failover.

Task Scheduling Management

When an application is submitted, the Master assigns an applicationId, places it in a waiting queue, and selects a suitable worker. The application state transitions from SUBMITTED → LAUNCHING → RUNNING, and the Master receives status updates from the worker. Forced termination is propagated to the worker, which interrupts the application thread and calls shutdown.

Workers load application dependencies via a custom class loader. In future versions, a distributed file system such as HDFS could be used to distribute these dependencies.

Fault Tolerance

Master persists worker and application states. Upon recovery, the new “Alive” Master reads persisted states, restores unknown workers and applications, and re‑establishes connections. Applications not yet scheduled are placed back in the queue; running applications are marked UNKNOWN until their workers report current status.

Event Bus

The Master creates an asynchronous event bus based on a blocking queue. When an application state changes, an event is posted to the bus and any registered AlohaEventListener receives it in real time.

RPC Layer

Aloha’s RPC module does not rely on an IDL; it uses Scala pattern matching to route messages. The key abstractions are RpcEndpoint, RpcEndpointRef and RpcEnv. RpcEndpoint defines lifecycle callbacks ( onStart, onStop, etc.) and two message‑handling methods: receive for fire‑and‑forget messages and receiveAndReply for request‑reply interactions.

//------------------------ Server side ----------------------------
object HelloWorldServer {
  def main(args: Array[String]): Unit = {
    val host = "localhost"
    val rpcEnv: RpcEnv = RpcEnv.create("hello-server", host, 52345, new AlohaConf())
    val helloEndpoint: RpcEndpoint = new HelloEndpoint(rpcEnv)
    rpcEnv.setupEndpoint("hello-service", helloEndpoint)
    rpcEnv.awaitTermination()
  }
}

class HelloEndpoint(override val rpcEnv: RpcEnv) extends RpcEndpoint {
  override def onStart(): Unit = {
    println("Service started.")
  }

  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
    case SayHi(msg) => context.reply(s"Aloha: $msg")
    case SayBye(msg) => context.reply(s"Bye :), $msg")
  }

  override def onStop(): Unit = {
    println("Stop hello endpoint")
  }
}

case class SayHi(msg: String)
case class SayBye(msg: String)

//--------------------------- Client side -------------------------------
object HelloWorldClient {
  def main(args: Array[String]): Unit = {
    val host = "localhost"
    val rpcEnv: RpcEnv = RpcEnv.create("hello-client", host, 52345, new AlohaConf, true)
    val endPointRef: RpcEndpointRef = rpcEnv.retrieveEndpointRef(RpcAddress("localhost", 52345), "hello-service")
    val future: Future[String] = endPointRef.ask[String](SayHi("WALL-E"))
    future.onComplete {
      case Success(value) => println(s"Got response: $value")
      case Failure(e) => println(s"Got error: $e")
    }
    Await.result(future, Duration.apply("30s"))
  }
}

The RpcEndpoint can also be used directly by Master and Worker components, making inter‑process communication look like local method calls.

Network Transmission

The underlying transport is built on Netty. Key components include TransportServer, TransportClient, TransportClientFactory, RpcHandler, TransportRequestHandler, TransportResponseHandler, and TransportChannelHandler. Messages are routed through a dispatcher, inboxes for local endpoints, and outboxes for remote endpoints.

Conclusion

Aloha is an open‑source distributed scheduling framework inspired by Spark. It supports plug‑in extensions, high availability, event‑driven monitoring, and a lightweight Scala‑based RPC mechanism, making it suitable for managing long‑running data‑processing jobs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backenddistributed schedulingRPCSparkScalaALOHA
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.