Backend Development 46 min read

DolphinScheduler Design, Architecture, and Source Code Analysis

This article provides a comprehensive overview of DolphinScheduler’s design strategies, distributed architecture, fault‑tolerance mechanisms, configuration files, core APIs, Quartz integration, master‑worker execution flow, RPC communication, load‑balancing algorithms, and logging services, accompanied by detailed code excerpts and diagrams.

Big Data Technology & Architecture

Oct 26, 2022

DolphinScheduler Design, Architecture, and Source Code Analysis

The article begins with an overview of DolphinScheduler’s terminology—task definition, task instance, workflow definition, workflow instance, and cron‑based scheduling—highlighting the distinction between centralized (Master/Slave) and decentralized designs.

It then describes the system’s modular architecture, including modules such as dolphinscheduler-alert, dolphinscheduler-api, dolphinscheduler-master, dolphinscheduler-worker, and the UI layer, each with its responsibilities.

Key configuration files are presented, for example the common properties file defining data directories, resource storage, and Zookeeper settings, and the application.yaml files for the API, Master, and Worker services, which configure ports, thread pools, and Quartz settings.

The core scheduling API is illustrated with the setScheduleState method that validates project and schedule state before invoking the Quartz scheduler:

public Map<String, Object> setScheduleState(User loginUser, long projectCode, Integer id, ReleaseState scheduleStatus) { ... }

Quartz job management is shown via the addJob method, which creates or updates a JobDetail and associates a CronTrigger with mis‑fire handling and timezone conversion:

public void addJob(Class<? extends Job> clazz, int projectId, final Schedule schedule) { ... }

The master node’s execution flow is explained, covering registration with Zookeeper, slot‑based command fetching, workflow creation, and the use of a WorkflowExecuteThread to run tasks. Fault‑tolerance for master and worker nodes is handled through Zookeeper watchers and recovery logic, with code snippets for master failover and slot management.

Worker startup, task plugin installation, and RPC communication are detailed. The Netty‑based RPC framework is used for master‑worker interaction, with examples of client initialization and request handling:

NettyRemotingClient client = new NettyRemotingClient(clientConfig);
client.registerProcessor(CommandType.TASK_EXECUTE_RESPONSE, taskExecuteResponseProcessor);

Load‑balancing algorithms (weighted random, lower weight, and smooth round‑robin) are described, including the weighted random selection logic:

int offset = ThreadLocalRandom.current().nextInt(totalWeight);
for (int i = 0; i < size; i++) {
    offset -= weights[i];
    if (offset < 0) return hosts.get(i);
}
return hosts.get(ThreadLocalRandom.current().nextInt(size));

Finally, the article covers RPC‑based log retrieval, showing how the client sends a RollViewLogRequestCommand to a remote host and parses the response.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Backend Development RPC load balancing Quartz DolphinScheduler

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.