Apache DolphinScheduler Practice at Xinwang Bank
Xinwang Bank leverages Apache DolphinScheduler to handle over 9,000 daily task instances across real‑time, near‑real‑time, and offline batch scenarios, detailing background, application scenarios, optimizations, workflow improvements, import/export enhancements, alert system upgrades, and future plans to expand data‑ops capabilities.
Xinwang Bank generates a large number of task instances daily, with real‑time tasks being the majority. To better manage these tasks, the bank adopted Apache DolphinScheduler, completing real‑time, near‑real‑time, and offline batch processing across multiple projects.
The presentation by senior big‑data engineer Chen Wei from Xinwang Bank’s Data Center covered four parts: background of adopting DolphinScheduler, application scenarios, optimization and transformation, and future plans.
Background
The bank chose DolphinScheduler based on three main needs: unified development environments, optimized testing scenarios, and improved production deployment.
Application Scenarios
Three primary scenarios were identified:
Offline data development and task scheduling for data warehouses and data marts.
Near‑real‑time data development and scheduling, using Flink to process upstream message‑queue logs and store results in ClickHouse.
Non‑ETL user‑defined batch jobs, enabled through an internal low‑code platform allowing business users to define and run their own data batches.
Optimizations and Transformations
Five major improvements were made to DolphinScheduler:
Project‑level environment isolation for development, testing, and production.
Consistent environment variable naming across environments while keeping isolation.
Data source isolation per project and environment with unified naming.
Support for non‑JDBC data sources such as Elasticsearch and Livy for Spark jobs.
Enhanced task independence, allowing separate task development, debugging, and configuration with project environment variables.
Additional enhancements include:
Task development and debugging support for SQL, Shell, Python, XSQL, with online log and result viewing.
Historical task integration to avoid rewriting existing code.
Separation of workflow and task to simplify orchestration.
Project‑wide environment variables to reduce per‑workflow configuration.
Data source lookup by name, supporting Phoenix and others.
Workflow launch logic was refined to prevent duplicate runs, enable environment‑based switching (e.g., disaster‑recovery to production), and improve error detection.
Import/export capabilities were expanded to include tasks, configurations, and resource files, handling ID conflicts across databases and supporting version management.
The alert system was integrated with the bank’s internal alert platform, adding subscription‑based alerts, startup/delay alerts, and priority task notifications.
Integration with internal systems such as IAM SSO, model task monitoring, and report push monitoring was also implemented, with network‑based feature restrictions for security.
Future Plans
Planned initiatives include promoting the offline data development platform to more teams, gradually replacing existing schedulers, and integrating the scheduling system with the bank’s data‑research management platform.
Technical Goals
Build a more intelligent and automated scheduling and orchestration system.
Provide advanced monitoring, prediction, and completion‑time forecasting for operations.
Offer a global view with data lineage and impact analysis.
Modularize configuration to lower development costs.
Integrate with the data quality management platform.
Support user‑defined board tasks.
The speaker concluded the session with thanks and invited the community to contribute to the DolphinScheduler project.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.