Big Data 12 min read

Analyzing Apache DolphinScheduler 2.x Source Code: Environment Setup, Service Startup, Task Execution, and Insights

This article presents a detailed walkthrough of Apache DolphinScheduler 2.x source code, covering environment preparation, service startup procedures for Master and Worker, task execution flow, key components like Netty and Zookeeper, and personal reflections on open‑source learning and improvement.

Big Data Technology & Architecture

May 10, 2022

Analyzing Apache DolphinScheduler 2.x Source Code: Environment Setup, Service Startup, Task Execution, and Insights

In the big‑data domain, many enterprises are adopting open‑source tools; this talk focuses on how to correctly select a data scheduling tool by analyzing the source code of Apache DolphinScheduler 2.x.

Speaker: Xu Haihui, Software Development Engineer at China Mobile Cloud Capability Center, introduces the four main parts of the presentation: source code environment preparation, service startup process, task execution flow, and personal thoughts and summary.

1. Environment Preparation

Set up the development environment and clone the DolphinScheduler source code.

Download the source from the official Apache DolphinScheduler download page.

2. Service Startup Process

The system consists of four services: UI, MasterServer, WorkerServer, and AlertServer (plus LoggerServer not shown in the diagram). MasterServer and WorkerServer register with Zookeeper, form clusters, and rely on Netty for communication.

MasterServer Startup Steps

Start Netty server.

Start event processor.

Start scheduler timer tasks.

Start StateWheel processor.

WorkerServer Startup Steps

Start Netty server.

Maintain WorkerServer node status.

Start TaskExecuteThread.

Start RetryReportTaskStatusThread.

3. Task Execution Flow

The UI shows workflow execution; the API layer (ExecutorController.java) creates a command that is stored in t_ds_command. The MasterSchedulerService periodically scans commands, builds a DAG, and dispatches tasks to Workers.

Key classes involved include: WorkflowExecuteThread – builds and submits tasks. TaskProcessor – dispatches tasks to a priority queue. NettyExecutorManager – sends commands via Netty. NettyClientHandler – receives tasks on the Worker side. TaskExecuteThread – executes tasks (Flink, Shell, Python, etc.).

After execution, Workers send a response command back to the Master.

4. Personal Reflections and Summary

Key observations include the lack of bean‑based parameter handling in the API, inconsistent table naming, and opportunities for improving code readability and maintainability.

Benefits of studying the source code:

Improves technical depth by exposing design patterns and concurrency mechanisms.

Accelerates mastery of related frameworks such as Netty and Zookeeper.

Enables rapid troubleshooting of production issues.

Encourages participation in open‑source communities and personal branding.

Suggested learning approach:

Start with official documentation.

Identify a demo to follow the main execution path.

Draw flow diagrams and take notes on critical components.

Iteratively refine understanding and integrate insights into personal projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java source-code-analysis Workflow Scheduling Apache DolphinScheduler

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.