Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works

Faced with limitations of existing tools like Quartz, XXL-Job, and PowerJob, the author explains the motivation for creating a custom scheduling framework, describes its architecture—including gRPC communication, protobuf serialization, a self-implemented name server for load balancing, a simple message queue, and time-wheel scheduling—provides code examples, and shares diagrams of discovery and dispatch processes.

Top Architect
Top Architect
Top Architect
Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works

Project Background

The existing scheduling frameworks (Quartz, XXL‑Job, PowerJob) are mature, but the business requires frequent creation and modification of timed tasks in a distributed environment. Specific pain points include:

MQ delay queues cannot dynamically adjust task parameters.

Redis expiration policies keep large keys (BigKey problem).

XXL‑Job lacks native OpenAPI and uses a DB‑lock based scheduler that favors high availability over performance.

PowerJob’s OpenAPI is HTTP‑based synchronous, making load‑balancing and high‑concurrency scheduling difficult.

To retain full control, trim unnecessary features, and learn from mainstream designs, a lightweight, refactored scheduling framework was built.

Positioning

Supports frequent creation of timed tasks and dynamic parameter changes via a lightweight API and an internal message queue.

Handles massive concurrent task execution with server‑side load balancing (group isolation + application‑level locks).

Targets small tasks, requiring minimal configuration and no direct task‑instance manipulation.

Technical Selection

Communication : gRPC (Netty NIO)
Serialization : Protobuf
Load Balancing : Custom NameServer registry
    └─ Strategy : Server‑side minimum schedule count
    └─ Interaction : pull + push
Message Queue : Simple custom MQ
    └─ Send : async + timeout retry
    └─ Persistence : mmap + synchronous flush
    └─ Retry : multi‑level delay queue + dead‑letter queue
Scheduling : Time‑wheel algorithm

Project Structure

├── LICENSE
├── k-job-common            // shared dependencies, invisible to developers
├── k-job-nameServer        // registration center for server and worker, provides load balancing
├── k-job-producer          // ordinary JAR, provides OpenAPI and async message sending
├── k-job-server            // SpringBoot‑based scheduling server
├── k-job-worker-boot-starter // Spring Boot starter for k‑job‑worker
├── k-job-worker            // ordinary JAR, client libraries to connect to k‑job‑server
└── pom.xml

Key Features

Load Balancing (massive concurrent task execution)

To avoid OOM when a single server fetches too many tasks, thread‑pool bottlenecks, and request concentration on a single OpenAPI endpoint, a registration center ( NameServer) records each server’s schedule count and selects the server with the smallest count for a given sub‑app.

Future work may add weight‑based distribution based on server geographic location and communication efficiency.

Custom Message Queue (frequent task creation and parameter changes)

Instead of PowerJob’s synchronous HTTP OpenAPI, the design uses gRPC futureStub for asynchronous calls. Because a BlockingQueue can lose acknowledgments if the server crashes, a custom MQ is built where each server maintains its own persistent queue.

Reliable Message Design

Inspired by RocketMQ: producers send to a broker (here each server acts as its own broker) and receive an ACK; consumers pull messages from their own queue. Persistence uses mmap‑based commit log and consumer‑queue files with synchronous flush to guarantee durability.

private MappedByteBuffer commitLogBuffer;          // commit log file mapped to memory
private MappedByteBuffer consumerQueueBuffer;    // consumer queue file mapped to memory
private final AtomicLong commitLogBufferPosition = new AtomicLong(0);
private final AtomicLong commitLogCurPosition = new AtomicLong(0);
private final AtomicLong lastProcessedOffset = new AtomicLong(0);
private final AtomicLong currentConsumerQueuePosition = new AtomicLong(0);

Message Retry Mechanism

Producers retry on a different server after timeout or error. Consumers use a multi‑level delay queue; if a message fails repeatedly it moves to a dead‑letter queue for manual intervention.

private static final Deque<MqCausa.Message> deadMessageQueue = new ArrayDeque<>();
private static final List<DelayQueue<DelayedMessage>> delayQueueList = new ArrayList<>(2);
private static final List<Long> delayTimes = Lists.newArrayList(10000L, 5000L);
static void init(Consumer consumer) {
    delayQueueList.add(new DelayQueue<>());
    delayQueueList.add(new DelayQueue<>());
    Thread consumerThread = new Thread(() -> {
        try {
            while (true) {
                DelayQueue<DelayedMessage> dq = delayQueueList.get(0);
                if (!dq.isEmpty()) {
                    DelayedMessage msg = dq.take();
                    consumer.consume(msg.getMessage());
                    dq.remove(msg);
                }
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    });
    consumerThread.start();
}
static class DelayedMessage implements Delayed {
    private final MqCausa.Message message;
    private final long triggerTime; // expiry timestamp
    DelayedMessage(MqCausa.Message message, long delay) {
        this.message = message;
        this.triggerTime = System.currentTimeMillis() + delay;
    }
    public long getDelay(TimeUnit unit) {
        return unit.convert(triggerTime - System.currentTimeMillis(), TimeUnit.MILLISECONDS);
    }
    public int compareTo(Delayed other) {
        return Long.compare(this.triggerTime, ((DelayedMessage) other).triggerTime);
    }
    public MqCausa.Message getMessage() { return message; }
}

App Auto‑Split and Worker Dynamic Allocation

The framework can automatically split an app into sub‑apps when the number of workers exceeds a configurable threshold. Each sub‑app is then assigned to the server with the minimum schedule count, achieving dynamic load balancing without developer intervention.

Server Rebalance Logic

public ReBalanceInfo getServerAddressReBalanceList(String serverAddress, String appName) {
    if (serverAddress.isEmpty()) {
        ReBalanceInfo reBalanceInfo = new ReBalanceInfo();
        reBalanceInfo.setSplit(false);
        reBalanceInfo.setServerIpList(new ArrayList<String>(serverAddressSet));
        reBalanceInfo.setSubAppName("");
        return reBalanceInfo;
    }
    ReBalanceInfo reBalanceInfo = new ReBalanceInfo();
    List<String> newServerIpList = serverAddress2ScheduleTimesMap.keySet().stream()
        .sorted((o1, o2) -> (int) (serverAddress2ScheduleTimesMap.get(o1) - serverAddress2ScheduleTimesMap.get(o2)))
        .collect(Collectors.toList());
    if (!appName2WorkerNumMap.isEmpty() && appName2WorkerNumMap.get(appName) > maxWorkerNum && appName2WorkerNumMap.get(appName) % maxWorkerNum == 1) {
        reBalanceInfo.setSplit(true);
        reBalanceInfo.setChangeServer(false);
        reBalanceInfo.setServerIpList(newServerIpList);
        reBalanceInfo.setSubAppName(appName + ":" + appName2WorkerNumMap.size());
        return reBalanceInfo;
    }
    Long lestScheduleTimes = serverAddress2ScheduleTimesMap.get(newServerIpList.get(newServerIpList.size() - 1));
    Long comparedScheduleTimes = lestScheduleTimes == 0 ? 1 : lestScheduleTimes;
    if (serverAddress2ScheduleTimesMap.get(serverAddress) / comparedScheduleTimes > 2) {
        reBalanceInfo.setSplit(false);
        reBalanceInfo.setChangeServer(true);
        reBalanceInfo.setServerIpList(newServerIpList);
        reBalanceInfo.setSubAppName("");
        return reBalanceInfo;
    }
    reBalanceInfo.setSplit(false);
    reBalanceInfo.setServerIpList(new ArrayList<String>(serverAddressSet));
    reBalanceInfo.setSubAppName("");
    return reBalanceInfo;
}

Persistence Layer

private MappedByteBuffer commitLogBuffer;          // commit log file mapped to memory
private MappedByteBuffer consumerQueueBuffer;    // consumer queue file mapped to memory
private final AtomicLong commitLogBufferPosition = new AtomicLong(0);
private final AtomicLong commitLogCurPosition = new AtomicLong(0);
private final AtomicLong lastProcessedOffset = new AtomicLong(0);
private final AtomicLong currentConsumerQueuePosition = new AtomicLong(0);
private final AtomicLong consumerPosition = new AtomicLong(0);

Architecture Diagrams

Service discovery, scheduling flow and overall architecture are illustrated below.

Architecture diagram
Architecture diagram
Service discovery
Service discovery
Scheduling flow
Scheduling flow
distributed-systemsJavaLoad BalancinggRPCschedulingMessage QueueOpenAPI
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.