Backend Development 14 min read

Why Rebuild a Distributed Scheduler? Inside a Custom Java Job Framework

This article explains the motivations behind creating a new distributed scheduling framework, compares existing solutions, and details the design choices—including gRPC communication, protobuf serialization, a custom NameServer for load balancing, and a built‑in message queue with persistence and retry mechanisms—to handle frequent task creation and dynamic parameter changes in a high‑concurrency environment.

Java Architect Essentials

Jun 10, 2025

Why Rebuild a Distributed Scheduler? Inside a Custom Java Job Framework

Project Background

Although many mature scheduling frameworks exist (Quartz, xxl‑job, PowerJob), the author chose to rewrite a framework to better fit frequent creation and dynamic modification of tasks in a distributed environment.

Problems with existing solutions include:

MQ delay queues cannot adjust task parameters dynamically.

Redis expiration requires long‑lived keys and may cause big keys.

xxl‑job lacks native OpenAPI and its DB‑lock scheduling ensures high availability but not high performance.

PowerJob’s HTTP‑based OpenAPI is synchronous and its group isolation requires manual configuration, limiting load balancing under high concurrency.

To retain control and learn from mainstream designs, a trimmed‑down custom scheduler was built.

Positioning

This is a rewritten and refactored version of PowerJob, extending its functionality to suit business needs.

Supports frequent creation and dynamic parameter changes via a lightweight API and internal message queue.

Handles massive concurrent tasks with group isolation and application‑level locking for load balancing.

Targets small tasks without extensive configuration or instance manipulation.

Technology Selection

Communication : gRPC (Netty NIO)
Serialization : Protobuf
Load Balancing : Custom NameServer
    |___ Strategy : Server‑side minimum schedule count
    |___ Interaction : pull + push
Message Queue : Simple custom MQ
    |___ Send : async + timeout retry
    |___ Persistence : mmap + sync flush
    |___ Retry : multi‑level delay queue + dead‑letter queue
Scheduling : Time‑wheel algorithm

Project Structure

├── LICENSE
├── k-job-common            // common dependencies, invisible to developers
├── k-job-nameServer        // registration center for server and worker, provides load balancing
├── k-job-producer          // OpenAPI jar, async message sending via internal MQ
├── k-job-server            // SpringBoot based scheduling server
├── k-job-worker-boot-starter // Spring Boot starter for workers
├── k-job-worker            // Jar required by applications that connect to k‑job‑server
└── pom.xml

Key Features

Load Balancing

Existing frameworks either lock the whole DB or restrict a worker group to a single server, causing OOM or performance bottlenecks. This design introduces a NameServer that records each server’s schedule count and assigns workers to the server with the fewest assignments, avoiding duplicate scheduling and improving cluster utilization.

Conditions for assigning a worker to a server:

workerNum > threshold within the same app group.

server.scheduleTimes > minServerScheduleTime × 2.

When these conditions are met, the NameServer splits the app into a sub‑app and balances the load.

Message Queue

To avoid blocking RPC calls and potential message loss, the system merges the broker and server, letting each server maintain its own persistent queue. Persistence uses mmap‑based commit log and consumer queue with synchronous flush, similar to RocketMQ.

// Example of commit log and consumer queue buffers
private MappedByteBuffer commitLogBuffer;
private MappedByteBuffer consumerQueueBuffer;
private final AtomicLong commitLogBufferPosition = new AtomicLong(0);
private final AtomicLong commitLogCurPosition = new AtomicLong(0);
private final AtomicLong lastProcessedOffset = new AtomicLong(0);
private final AtomicLong currentConsumerQueuePosition = new AtomicLong(0);
private final AtomicLong consumerPosition = new AtomicLong(0);

Message retry uses a multi‑level delay queue; failed messages eventually move to a dead‑letter queue for manual intervention.

private static final Deque<MqCausa.Message> deadMessageQueue = new ArrayDeque<>();
private static final List<DelayQueue<DelayedMessage>> delayQueueList = new ArrayList<>(2);
// ... implementation of DelayedMessage and retry logic ...

Additional Diagrams

Service discovery flow:

Scheduling flow:

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Scheduling grpc Time Wheel

Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.