Big Data 10 min read

Design and Architecture of JD's Buffalo Distributed Workflow Scheduling System

The article introduces JD's Buffalo distributed workflow scheduling system, detailing its dual-layer entity model, instance-based scheduling, high‑availability three‑tier architecture, performance optimizations such as horizontal scaling and event‑driven execution, as well as cold‑hot data separation and open APIs for future enhancements.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Design and Architecture of JD's Buffalo Distributed Workflow Scheduling System

In large‑scale data processing, workflow task scheduling plays a crucial role, requiring flexible orchestration, diverse scheduling strategies, and system stability and efficiency. This article examines JD's self‑developed distributed workflow scheduling system, Buffalo, highlighting its key features and technical architecture.

Buffalo Scheduling System Overview Buffalo is JD's proprietary distributed DAG job scheduling platform that provides offline job orchestration, debugging, monitoring, and DAG scheduling for data engineers, algorithm engineers, and analysts, aiming to deliver a stable, efficient, user‑friendly, containerized, and open ETL scheduling system.

The core challenges include complex dependency relationships due to intricate business logic, massive task volume with high stability and performance demands, and diverse data processing scenarios requiring rich scheduling capabilities.

01 Scheduling System Introduction

Buffalo adopts a dual‑layer entity model consisting of Action (the smallest executable unit carrying scripts, parameters, environment, etc.) and Task (a DAG composed of one or more Actions and trigger rules, which can depend on other Tasks, forming an outer DAG for two‑level scheduling).

This model offers stronger orchestration flexibility compared to single‑layer designs.

02 Core Technical Solutions

To support flexible business processing, rapid task‑volume growth, and system stability, optimizations focus on usability, reliability, and high performance.

Entity and Orchestration Model

The dual‑layer model enables instance‑based scheduling: task definitions are stateless; when a task reaches its run cycle, an executable instance is generated based on the configuration, making dependencies explicit and traceable.

Classification and hierarchical scheduling ensure important tasks receive priority resources, with priority information propagated to the underlying compute clusters.

High‑Availability Architecture

Buffalo consists of three layers, each with high‑availability design:

Manager Layer : stateless task creation, management, and operation functions, horizontally scalable.

High‑Availability Scheduler : core engine generating and dispatching task instances, using active‑active + standby architecture with data sharding and idempotent state handling; fails over automatically when a node crashes.

Fault‑Tolerant Execution Layer : runs tasks on physical machines or Kubernetes containers, supporting high availability and flexible resource management.

High Performance

Performance is achieved through horizontal scaling (multi‑active scheduler with hash‑based sharding), event‑driven execution (triggered on dependency changes rather than periodic polling), and in‑memory resource scheduling that avoids distributed locks and external storage bottlenecks.

Cold‑hot data separation addresses rapid growth of task instance data: hot data (frequently accessed) remains in the primary store, while cold data (mostly read‑only historical instances) is stored separately, with indexing and primary‑key schemes enabling fast location and occasional operations such as re‑run or forced success.

Open Capabilities

Open APIs via HTTP for task configuration, instance operations, status and log queries.

Open events using JDQ asynchronous messaging to synchronize task and instance state changes with business systems.

03 Future Plans

Buffalo will continue to evolve with enhancements in containerization, plugin extensibility, open capabilities, and fine‑grained resource management, inviting user feedback to build a more stable, efficient, and user‑friendly scheduling platform.

performanceBig Datadistributed schedulingWorkflowHigh AvailabilityBuffaloJD
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.