Interview on xxl-job Task Scheduling Framework and Handling Overlapping Tasks
The interview discusses various routing and blocking strategies of the xxl-job distributed task scheduling framework, explains how it addresses task overlap, idempotency issues, and provides practical solutions such as single‑machine execution, locking mechanisms, and using a business date to avoid date‑related problems.
Interviewer: Let's talk about the task scheduling frameworks you have used.
Candidate: There are many options, such as Quartz, Spring Batch, xxl-job, and the newer PowerJob. I use xxl-job the most.
Interviewer: Have you encountered task overlap problems with xxl-job?
Candidate: Overlap is common in batch scheduling; most distributed frameworks can mitigate some duplication but not all.
Interviewer: Which overlap issues does xxl-job solve, and which remain?
Candidate: The routing strategies of xxl-job are shown below:
FIRST: always select the first machine.
LAST: always select the last machine.
ROUND: round‑robin based on the registration order.
RANDOM: randomly select an online machine.
CONSISTENT_HASH: hash‑based fixed machine selection with even distribution.
LEAST_FREQUENTLY_USED: select the machine used least frequently.
LEAST_RECENTLY_USED: select the machine that has been idle the longest.
FAILOVER: choose the first machine that passes a heartbeat check.
BUSYOVER: choose the first machine that is idle.
SHARDING_BROADCAST: broadcast to all machines in the cluster with shard parameters.
In practice, the most common strategies are fixed‑machine (FIRST/LAST) and round‑robin. For a high‑frequency job that runs every two minutes, running on a single machine can cause overlap if the previous execution hasn't finished before the next trigger.
Interviewer: If we fix the job to a single machine (e.g., FIRST or LAST) and multiple jobs overlap, how does xxl-job handle it?
Candidate: For tightly scheduled jobs, xxl-job provides three blocking strategies:
Single‑machine serial (default): requests enter a FIFO queue and run sequentially.
Discard later schedules: if a job is already running, the new request is discarded and marked as failed.
Cover previous schedule: the running job is terminated, the queue cleared, and the new request runs immediately.
With the default serial strategy, later jobs wait in line until the previous one finishes.
Interviewer: Have you seen any issues with this approach?
Candidate: Generally it works, but if a job depends on the current date, a delayed execution can push the job to the next day, causing business impact.
Interviewer: Can you give a concrete example?
Candidate: In a loan system, a batch job sends repayment reminder SMS to customers whose due date is the next day. The job queries users with a due date of "tomorrow" and sends messages. If the job is queued and runs after the date changes, some customers won't receive the reminder.
Interviewer: Does xxl-job provide a solution for this scenario?
Candidate: Using a round‑robin routing strategy distributes the queued tasks across multiple machines, reducing the backlog on a single node.
Interviewer: If the backlog is caused by slow downstream APIs or poor SQL performance, can round‑robin help?
Candidate: No. Round‑robin only alleviates resource contention on a single machine; it cannot fix slow external calls or inefficient queries.
Interviewer: What solutions exist for those problems?
Candidate: In finance systems, a "cut‑date" concept is used: business logic reads a stored accounting date instead of the system clock. The accounting date is only advanced after all related batch jobs finish.
Interviewer: What other issues can arise from using round‑robin?
Candidate: Idempotency problems. For example, a job fetches 100 "unprocessed" records every two minutes, processes them, and marks them as "processed". If two machines run concurrently, the same records may be fetched and processed twice.
Interviewer: How can we solve the idempotency issue?
Candidate: Two approaches:
Run the job on a single machine.
Introduce an intermediate state "processing" and use a row‑level lock (or exclusive lock) when selecting records, update the state to "processing", commit the transaction, then execute the business logic, finally set the state to "processed".
select * from xxx where status='未处理' limit yy,100 for update;
update xxx set status='处理中' where id in (...);Interviewer: What if the data source cannot be locked, such as email or API queries?
Candidate: Store a unique key of the fetched data as a primary key in a database; subsequent jobs can exclude already processed keys, ensuring idempotency.
Interviewer: Great, congratulations on moving to the next round.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
