Achieving High Availability for Elastic Job Lite with Dual‑Data‑Center Sharding
This article explains how to move from risky single‑node deployments of Elastic Job Lite to a robust high‑availability setup across two data centers, covering Zookeeper‑based sharding consistency, priority scheduling, and custom sharding strategies with full Java code examples.
Introduction
When using Elastic Job Lite for scheduled tasks, many teams deploy a single instance, which is risky for critical data updates. Elastic Job Lite actually supports high availability, but most articles overlook this.
Single‑node deployment risk
Developers often fear tasks being triggered simultaneously, leading to duplicate execution. However, Elastic Job guarantees job‑sharding consistency: only one instance executes a given shard in a distributed environment, using Zookeeper for leader election.
Job sharding consistency ensures that the same shard is executed by only one instance.
Therefore the single‑node architecture shown in the diagram is safe only if the service remains available; otherwise HA is lost.
Dual‑data‑center HA architecture
To meet higher HA requirements, a same‑city dual‑data‑center deployment can be used. If one data center fails, the other takes over, and Elastic Job still ensures that a shard runs on only one instance across both sites.
Note: Zookeeper itself does not provide cross‑data‑center HA; a single large cluster spanning two sites is insufficient.
Priority scheduling
Tasks may depend on a primary‑secondary data source. Write‑heavy tasks should run in the primary data‑center, while read‑only tasks can run in the secondary to avoid cross‑site latency.
Custom sharding strategy
Elastic Job allows custom sharding strategies by implementing JobShardingStrategy. Using the decorator pattern, a strategy can filter standby instances before delegating to the built‑in average allocation.
public abstract class JobShardingStrategyActiveStandbyDecorator implements JobShardingStrategy {
private JobShardingStrategy inner = new AverageAllocationJobShardingStrategy();
protected abstract boolean isStandby(JobInstance jobInstance, String jobName);
@Override
public Map<JobInstance, List<Integer>> sharding(List<JobInstance> jobInstances, String jobName, int shardingTotalCount) {
List<JobInstance> candidates = new ArrayList<>(jobInstances);
List<JobInstance> remove = new ArrayList<>();
boolean removeSelf = false;
for (JobInstance jobInstance : jobInstances) {
boolean standby = false;
try {
standby = isStandby(jobInstance, jobName);
} catch (Exception e) {
log.warn("isStandBy throws error, consider as not standby", e);
}
if (standby) {
if (IpUtils.getIp().equals(jobInstance.getIp())) {
removeSelf = true;
}
candidates.remove(jobInstance);
remove.add(jobInstance);
}
}
if (candidates.isEmpty()) {
candidates = jobInstances;
log.info("[{}] ATTENTION!! Only backup job instances exist, but do sharding with them anyway {}", jobName, JSON.toJSONString(candidates));
}
if (!candidates.equals(jobInstances)) {
log.info("[{}] remove backup before really do sharding, removeSelf :{} , remove instances: {}", jobName, removeSelf, JSON.toJSONString(remove));
} else {
log.info("[{}] job instances just remain the same {}", jobName, JSON.toJSONString(candidates));
}
candidates.sort((o1, o2) -> o1.getJobInstanceId().compareTo(o2.getJobInstanceId()));
return inner.sharding(candidates, jobName, shardingTotalCount);
}
}Concrete implementations specify which IPs are active. For example:
public class ActiveStandbyESJobStrategy extends JobShardingStrategyActiveStandbyDecorator {
@Override
protected boolean isStandby(JobInstance jobInstance, String jobName) {
String activeIps = "10.10.10.1,10.10.10.2";
if ("TASK_B_FIRST".equals(jobName)) {
activeIps = "10.11.10.1,10.11.10.2";
}
return !Arrays.asList(activeIps.split(",")).contains(jobInstance.getIp());
}
}When building the job configuration, set the custom strategy class:
JobCoreConfiguration core = JobCoreConfiguration.newBuilder(jobClass.getName(), cron, shardingTotalCount)
.shardingItemParameters(shardingItemParameters).build();
SimpleJobConfiguration jobConfig = new SimpleJobConfiguration(core, jobClass.getCanonicalName());
LiteJobConfiguration liteConfig = LiteJobConfiguration.newBuilder(jobConfig)
.jobShardingStrategyClass("com.xxx.yyy.job.ActiveStandbyESJobStrategy")
.build();Result
The approach provides:
High availability across two data centers.
Priority scheduling so tasks run in the preferred site.
However, the standby site may still face unknown issues (e.g., missing DB permissions), and true active‑active traffic sharing would require additional logic.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
