Operations 24 min read

Juice: An Open‑Source Distributed Task Scheduling Framework Built on Apache Mesos

Juice is an open‑source Mesos‑based distributed task scheduling framework that abstracts Docker tasks, provides REST APIs, leverages Redis queues, and implements custom resource allocation, offering high resource utilization and fault‑tolerant execution for audio‑video processing workloads.

Hujiang Technology
Hujiang Technology
Hujiang Technology
Juice: An Open‑Source Distributed Task Scheduling Framework Built on Apache Mesos

Before introducing Juice, the article first explains Apache Mesos as a two‑level scheduler: the Master uses an internal Allocator for the first‑level Master→Framework scheduling, and each Framework performs the second‑level resource→task allocation.

About Mesos Framework

The overall architecture consists of a Scheduler and an Executor. The Scheduler (available via HTTP REST API since Mesos 1.0) receives callbacks from the Master (resource offers, task status updates, etc.) and interacts with Agents, which launch either the built‑in Mesos Executor or a Docker Executor depending on the container type.

Mesos Framework Interaction API

Two main APIs are used: Scheduler API ( http://mesos.apache.org/documentation/latest/scheduler-http-api/ ) and Executor API ( http://mesos.apache.org/documentation/latest/Executor-http-api/ ). Typical workflow:

Scheduler sends a SUBSCRIBE request to the Master, providing a unique framework_info.id .

Master issues an OFFERS event with available resources.

Scheduler calls ACCEPT to assign tasks to the offered resources.

Master forwards the tasks to the corresponding Agent.

Agent launches the appropriate Executor (Docker or Mesos) and runs the task.

Executor reports UPDATE (task status) back to the Agent.

Agent forwards the status to the Master, which triggers a Scheduler UPDATE callback.

Scheduler acknowledges the update with ACKNOWLEDGE .

Task Status and Agent Failure Handling

Mesos defines 13 task states; the most common are:

TASK_STAGING – task assigned but not yet running.
TASK_RUNNING – task is executing on an Agent.
TASK_FINISHED – task completed successfully.
TASK_KILLED – task was terminated via the
KILL
API.
TASK_FAILED – task execution failed.
TASK_LOST – task lost, typically due to Agent crash.

If an Agent crashes, the Master pings the Agent; after a configurable timeout and retry count, the Master removes the Agent, deletes its tasks and executors, revokes offers, and removes the Agent from the replicated log.

Using Marathon for Deployment

Many open‑source frameworks (e.g., Marathon) run on top of Mesos. The author’s production environment uses Marathon to run long‑running services, providing start/stop, scaling, and health‑check capabilities, integrated with Jenkins and Docker for automated deployment.

Why Juice

Juice is a custom Mesos Framework developed to replace an internal TaskCenter that lacked distributed scheduling. The goals are high resource utilization, support for any task type (encapsulated in Docker containers), and platform stability.

Juice Architecture

Juice consists of two components:

Juice‑Rest : a Spring‑Boot REST API layer for CRUD operations on Juice tasks.

Juice‑Service : the core framework that communicates with the Mesos Master (ProtoBuf over a long‑lived HTTP connection) to handle resource offers, task submission, and status updates. Multiple Juice‑Rest instances can be deployed for scaling, and Juice‑Service runs in a master‑slave mode coordinated by ZooKeeper.

Juice‑Rest Parameter Settings

Juice‑Rest is built with Spring‑Boot. When submitting a Docker task, the JSON payload includes fields such as callbackUrl , taskName , env , args , and a container object specifying the Docker image and type.

{
  "callbackUrl":"http://www.XXXXXXXX.com/v5/tasks/callback",
  "taskName":"demo-task",
  "env":{"name":"environment","value":"dev"},
  "args":["this is a test"],
  "container":{
    "docker":{ "image":"dockerhub.XXXX.com/demo-slice" },
    "type":"DOCKER"
  }
}

Only Docker containers are supported currently, but a Mesos container type placeholder exists for future extension.

Commands mode (e.g., "commands":"/home/app/entrypoint.sh" ) allows running arbitrary shell scripts, useful for agents that need to execute simple scripts or legacy JAR‑based jobs.

Middleware Used by Juice

When a task is received by Juice‑Rest, it is first placed into a Redis List queue (LPUSH/RPOP). Redis was chosen for its lightweight nature and mature HA solutions. The task metadata is also stored in a MySQL table ( Juice_tasks ) for persistence and future retry/recovery mechanisms.

Juice‑Service Internal Processing Flow

Juice‑Service establishes a long‑lived HTTP connection to the Mesos Master using ProtoBuf calls. The connecting() method sends a SUBSCRIBE request, reads chunked events from the stream, and dispatches them to onEvent() .

private void connecting() throws Exception {
    InputStream stream = null;
    Response res = null;
    try {
        Protos.Call call = subscribeCall();
        res = Restty.create(getUrl())
                .addAccept(protocol.mediaType())
                .addMediaType(protocol.mediaType())
                .addKeepAlive()
                .requestBody(protocol.getSendBytes(call))
                .post();
        streamId = res.header(STREAM_ID);
        stream = res.body().byteStream();
        log.info("send subscribe, frameworkId : " + frameworkId + " , url " + getUrl() + ", streamId : " + streamId);
        if (stream == null) {
            log.warn("stream is null");
            throw new DriverException("stream is null");
        }
        while (true) {
            int size = SendUtils.readChunkSize(stream);
            byte[] event = SendUtils.readChunk(stream, size);
            onEvent(event);
        }
    } catch (Exception e) {
        log.error("service handle error, due to : " + e);
        throw e;
    } finally {
        if (stream != null) stream.close();
        if (res != null) res.close();
        streamId = null;
    }
}

The main callback events processed are:

SUBSCRIBED : registers the FrameworkID in the database.

OFFERS : triggers resource‑task matching and submission to the Master.

UPDATE : propagates task completion status from Agent → Master → Juice‑Service.

ERROR : handles fatal errors (e.g., missing FrameworkID) by resetting the service or generating a new FrameworkID.

Offer handling involves filtering based on Agent attributes (e.g., mesos.framework.attr=lms,qa,mid|big ), declining unsuitable offers, and constructing TaskInfo objects for accepted offers.

private void onEvent(byte[] bytes) {
    // ...
    switch (event.getType()) {
        case OFFERS:
            try {
                event.getOffers().getOffersList().stream()
                    .filter(of -> {
                        if (SchedulerService.filterAndAddAttrSys(of, attrMap)) {
                            return true;
                        }
                        declines.add(of.getId());
                        return false;
                    })
                    .forEach(of -> {
                        List
tasks = newArrayList();
                        // build tasks and accept offers
                    });
                if (declines.size() > 0) {
                    AuxiliaryService.declineOffer(...);
                }
            } finally {
                declines.clear();
                attrMap.clear();
            }
            break;
        // other cases ...
    }
}

Juice‑Service also maintains several internal queues:

juice.task.queue : primary task queue.

juice.task.retry.queue : holds tasks that could not be scheduled on the current offer; prioritized on the next offer.

juice.task.result.queue : stores task results for callbacks.

juice.management.queue : used for reconciliation and kill commands.

Submitting a Task via the SDK

@Test
public void submitsDocker() {
    Submits submitsDocker = Submits.create()
            .setDockerImage("dockerhub.XXXX.com/demo-slice")
            .setTaskName("demo-slice")
            .addArgs("/10002/res/L2.mp4")
            .addEnv("environment", "dev")
            .addResources(2.0, 2048.0);
    Long taskId = JuiceClient.create("http://your-juice-rest-host/v1/tasks", "your-system-id-in-string")
            .setOperations(submitsDocker)
            .handle();
    if (taskId != null) {
        System.out.println("submitsDocker, taskId --> " + taskId);
    }
}

Summary and Future Directions

Juice 1.1.0 is in testing; upcoming 1.2.0 will add features such as task prioritization (via priority=1 ) and automatic retry (via retry=1 , up to three attempts). The project is open‑source; contributors are encouraged to fork the repository or contact the author for further enhancements.

Q&A

Key differences between Juice and Elastic‑Job, details on the resource‑allocation algorithm, handling of temporary tasks, Docker packaging support, and a brief comparison of Mesos vs. Kubernetes are discussed.

End of article.

JavaDockerdistributed schedulingRedisopen-sourceMesosTask Queue
Hujiang Technology
Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.