Backend Development 47 min read

Design and Source Code Analysis of Apache DolphinScheduler

This article provides an in‑depth technical overview of Apache DolphinScheduler, covering its distributed design strategies, fault‑tolerance mechanisms, remote log access, source‑code module breakdown, API interfaces, Quartz integration, master‑worker execution flows, RPC communication, load‑balancing algorithms, logging services, and community contribution guidelines.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Design and Source Code Analysis of Apache DolphinScheduler

The article begins with an introduction to DolphinScheduler, explaining the motivation for studying its design and outlining the main concepts such as task definitions, workflow instances, and cron‑based scheduling.

Distributed Design – Two architectures are compared: centralized (master/slave) and decentralized (peer‑to‑peer). The decentralized approach relies on Zookeeper for node registration and leader election, with diagrams illustrating the topology.

Fault Tolerance – Service and task failure recovery are described, including master and worker failover processes. Example code shows how the system detects node removal events and reschedules tasks using Zookeeper watchers:

public void setSchedule(int projectId, Schedule schedule) {
    quartzExecutor.addJob(ProcessScheduleJob.class, projectId, schedule);
}

Source Code Analysis – The article enumerates DolphinScheduler modules (alert, api, common, dao, remote, server, worker, etc.) and presents key configuration files (common.properties, application.yaml). It also details the main API for publishing schedules:

public Map
setScheduleState(User loginUser, long projectCode, Integer id, ReleaseState scheduleStatus) { ... }

Quartz Integration – The Quartz scheduler components (SchedulerFactory, Scheduler, Job, Trigger, JobStore) are explained, with flowcharts and code snippets showing job creation and trigger management.

Master and Worker Execution – The master’s Netty server initialization, task dispatch logic, and slot‑based command processing are described. Worker initialization, Netty server setup, and task execution flow are also covered, with diagrams of the communication sequence.

RPC Communication – Master‑to‑worker and worker‑to‑master interactions use Netty remoting. Sample code illustrates how commands are sent synchronously and asynchronously:

ChannelFuture future = channel.writeAndFlush(command).await();

Load‑Balancing Algorithms – Three strategies (random weighted, lower weight, smooth round‑robin) are implemented. Example implementation of the random weighted selector:

int offset = ThreadLocalRandom.current().nextInt(totalWeight);
for (int i = 0; i < size; i++) {
    offset -= weights[i];
    if (offset < 0) return hosts.get(i);
}

Logging Service – The log client queries remote logs via RPC, converting requests to commands and handling responses.

Finally, the article encourages community participation, listing ways to contribute to the DolphinScheduler project and providing reference links.

distributed schedulingRPCLoad Balancingfault tolerancelog serviceQuartzDolphinScheduler
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.