Backend Development 24 min read

Designing a High‑Performance Go‑Job Scheduler: Architecture, SDK & Task Flow

This article presents a comprehensive technical deep‑dive into Go‑Job, a Go‑native distributed task scheduling framework, covering its background, three‑layer architecture, service and task design, SDK modules, code examples, practical integration steps, and future enhancements for robust backend operations.

Architect

Jul 26, 2024

Background

When evaluating task‑scheduling platforms, the team discovered that popular open‑source projects such as XXL‑Job and Elastic‑Job are Java‑centric, making integration difficult for their Go‑based services. To satisfy specific business requirements, they created Go‑Job , a Go‑native distributed task‑scheduling framework.

Architecture Design

The overall architecture is divided into three parts:

Web Console : Manages task configuration, creation, editing, history, and permission control.

Scheduler Service : Handles trigger management, task generation, and task‑matching logic.

Executor : Receives tasks, executes them, and reports status back to the scheduler.

Key terminology:

namespace : Resource isolation.

handler : User‑defined task processing class.

worker : Service that runs a handler and communicates with the Go‑Job server via the SDK.

trigger : Scheduling rule definition on the platform.

task : Information generated according to a trigger.

runInstance : Smallest execution unit of a task.

The diagram above shows the three‑layer split and the flow of task scheduling.

Service Design

The scheduler’s backend consists of three core modules:

Controller : Unified entry point providing RESTful and RPC APIs for task CRUD and SDK connection management. It forwards requests to other modules.

Trigger : Core module that listens to trigger states, creates tasks at scheduled times, and pushes them to the Matching service.

Matching : Matches generated tasks with appropriate client nodes based on label compatibility, then pushes full task data to the SDK.

This modular collaboration ensures high‑throughput task processing and isolation between different scripts.

Task Design

Task design is the core of the system and includes generation, matching, lifecycle, and execution phases.

Task Generation : When a trigger fires, the system calculates the next execution time, creates a trigger event, and inserts it into the task queue. The task manager asynchronously fetches due events, creates Task objects, and pushes them to the Matching service.

Task Matching : Consists of a task queue, client request cache, and matching module. The queue holds pending tasks; the client cache records all SDK‑connected executors; the matching module filters and matches tasks to executors based on labels.

Task Lifecycle : States include Created, WaitingToRun, Running, Canceled, RunToCompletion, and Faulted. Each transition is illustrated in the lifecycle diagram.

Task Sharding : For long‑running tasks, the trigger can specify a shard count, splitting a task into multiple sub‑tasks that run concurrently on different nodes, improving throughput.

Example: Task A runs on a single pod for 8 minutes; Task B is sharded into four parts, each completing in 2 minutes, reducing overall time by 75 %.

SDK Design

The SDK bridges the business service and the Go‑Job platform. It has two parts:

Script Part : User‑defined handler implementing the Do interface.

SDK Part : Provides connection management, bidirectional gRPC streaming, task execution, health checking, and configuration management.

Key SDK modules:

Connection Management : Establishes and maintains gRPC streams, handles reconnection on failures.

Data Exchange : Sends and receives task data via the stream.

Task Execution : Asynchronously invokes the user‑defined Do method.

Health Check : Periodic heartbeat to keep the connection alive.

Configuration Management : Manages environment variables, labels, and server addresses.

Task Execution Flow in SDK

Receive task from data‑exchange module.

Initialize task context (function init, timeout, etc.).

Execute the user‑defined Do function.

Report execution result and clean up.

Connection Lifecycle

Both client and server start in an unconnected state. The client initiates a gRPC stream, the server acknowledges, and both sides trigger OnConnected. If the connection drops, the client fires a reconnection loop, re‑registers scripts upon success, and resumes data exchange.

Practical Guide

Five steps to create a custom script and integrate it with Go‑Job:

Import the SDK package:

import "go-job-sdk/config"

type Config struct {
    JobConfig config.Config `yaml:"jobConfig"`
}

Create a handler struct and implement the Do method:

type HelloHandler struct{}

func (w *HelloHandler) Do(ctx job.Context) error {
    info := job.GetTaskInfo(ctx)
    fmt.Printf("Params: %s
", info.Param)
    fmt.Printf("RunInstanceId: %s
", info.RunInstanceId)
    fmt.Printf("ShardNum: %d
", info.ShardNum)
    fmt.Printf("RunTimeout: %v
", info.RunTimeout)
    fmt.Printf("TaskId: %v
", info.TaskId)
    return nil
}

func main() {
    group := worker.NewWorkerGroup(context.Background(), cfg.JobConfig)
    group.Add("hello-world1", &HelloHandler{})
    group.Add("hello-world2", &HelloHandler{})
    if err := group.Start(); err != nil {
        fmt.Println(err)
        return
    }
    if err := group.Wait(); err != nil {
        fmt.Println(err)
        return
    }
}

Configure the job center and handlers in config.yaml:

jobConfig:
  app: "demo"
  disable: false
  jobCenterService:
    rpcAddress: "xx"
    namespace: "test"
  handlers:
    hello-world1:
      disable: false

Create a trigger via the web console, specifying schedule, retry count, sharding, timeout, and alert settings.

Results & Outlook

Since launch, Go‑Job has been adopted by multiple internal teams, satisfying diverse business scenarios. Future plans include:

Monitoring & Alerts : Real‑time dashboards and advanced alert rules (e.g., long execution, resource limits).

Rich Scheduling Strategies : Multi‑level priority, holiday/workday policies.

Security & Compliance : Fine‑grained permission management.

The architecture diagram below summarizes the complete system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems SDK architecture Go task scheduling Go-Job

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.