Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans
This presentation details the background, redesign, and migration of a large‑scale data platform at Dangbei Network Technology, focusing on the adoption of Apache DolphinScheduler, ClickHouse migration, storage and compute separation, monitoring solutions, and the roadmap for future upgrades and open‑source involvement.
In the afternoon, Wang Yuxiang, a basic development engineer from Dangbei Network Technology's big data platform, introduced the application of Apache DolphinScheduler within their environment, outlining four main parts: platform background, big data platform reconstruction, scheduler platform construction, and future planning.
Background Before using Apache DolphinScheduler, the platform suffered from low OS versions, chaotic service deployment, slow MapReduce (MR) compute, insufficient storage, lack of high‑availability, no visual operations, and missing alert mechanisms.
Platform Reconstruction Goals The goals were to build an efficient and stable big data platform, achieve massive data storage, ensure a secure HA architecture, separate compute and storage, provide visual operations, and enable real‑time monitoring and alerts.
Architecture Design The redesigned platform uses HDFS, OSS, ClickHouse, Elasticsearch, Kafka, and Hudi for storage; a compute layer based on Spark and MR; and a service layer for task scheduling, permission control, and API management. The architecture also integrates jindoFS for accelerated data access.
Problem Analysis and Solutions Key issues identified included low OS versions, mixed component deployment, disk space shortage, and inadequate monitoring. Solutions involved upgrading CDH from 5.7 to 6.3.0, migrating MR to Spark, adopting compute‑storage separation with YARN+OSS, and implementing Prometheus + Grafana for monitoring.
ClickHouse Migration Multiple migration methods were evaluated; the chosen approach used remote tables for full and incremental sync. Example commands:
select database,create_table_query from system.tables where database in('athena','dmp','sony'); create database dmp ON CLUSTER cluster_clickhouse; insert into dmp.dws_dmp_user_local ON CLUSTER cluster_clickhouse SELECT * FROM remote('192.168.1.1:9000', dmp, dws_dmp_user, 'default', '');Important migration notes: use cluster_5shards_1replicas → cluster_clickhouse and append ON CLUSTER cluster_clickhouse to table names.
Scheduler System Migration The original Oozie scheduler lacked visual UI, retry, multi‑tenant support, and suffered from deadlocks. After evaluation, Apache DolphinScheduler was selected for its DAG drag‑and‑drop, dynamic task control, fault tolerance, HA, and rich task types. The production deployment uses version 1.3.8 with 2 Master nodes, 7 Workers, and 1 API node, handling over 200 daily workflows and 5,000 tasks.
Current Issues with DolphinScheduler Observed problems include distributed lock bottlenecks, low Master resource utilization, high database load, and tight coupling with Zookeeper and worker dependencies.
Future Plans The roadmap includes upgrading to the 2.0 architecture to resolve 1.3 limitations, integrating with other company platforms, enabling cross‑cluster task calls, and improving monitoring and alerts. Additionally, the team plans to contribute to more Apache open‑source projects and promote open‑source participation.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.