Master Spring Batch: Core Concepts, Architecture, and Best Practices

This article provides a comprehensive guide to Spring Batch, covering its purpose, architecture, core components such as Job, Step, ItemReader/Writer/Processor, chunk processing, skip strategies, and practical tips for configuring and optimizing batch jobs in Java applications.

macrozheng
macrozheng
macrozheng
Master Spring Batch: Core Concepts, Architecture, and Best Practices

Spring Batch Overview

Spring Batch is a lightweight, comprehensive batch‑processing framework provided by the Spring ecosystem. It is designed for enterprise applications that need to process large volumes of data without user interaction, handling tasks such as end‑of‑month calculations, insurance benefit determinations, and massive daily transaction processing.

Spring Batch Architecture

A typical batch job reads a large number of records from a database, file, or queue, processes the data, and writes the transformed records back to a destination. The following diagram illustrates this flow:

The overall Spring Batch architecture consists of Jobs, Steps, and supporting components such as JobRepository and JobLauncher.

Core Concepts of Spring Batch

What is a Job

A Job represents the entire batch process and is the top‑level abstraction. It defines a sequence of Steps and can be configured with listeners, restart policies, and parameters.

/**
 * Batch domain object representing a job. Job is an explicit abstraction
 * representing the configuration of a job specified by a developer. It should
 * be noted that restart policy is applied to the job as a whole and not to a
 * step.
 */
public interface Job {
    String getName();
    boolean isRestartable();
    void execute(JobExecution execution);
    JobParametersIncrementer getJobParametersIncrementer();
    JobParametersValidator getJobParametersValidator();
}

Jobs are implemented mainly by SimpleJob and FlowJob. A Job contains one or more Steps.

What is a JobInstance

A JobInstance is a unique execution of a Job with a specific set of parameters. It provides methods to obtain the instance ID and the job name.

public interface JobInstance {
    /** Get unique id for this JobInstance. */
    long getInstanceId();
    /** Get job name. */
    String getJobName();
}

Each logical run (e.g., an end‑of‑day job) creates a distinct JobInstance.

What is JobParameters

JobParameters are a set of key‑value pairs that uniquely identify a JobInstance. They are typically used to pass values such as the execution date.

What is JobExecution

JobExecution represents a single attempt to run a Job. It records status, start/end times, exit status, and the associated JobParameters.

public interface JobExecution {
    long getExecutionId();
    String getJobName();
    BatchStatus getBatchStatus();
    Date getStartTime();
    Date getEndTime();
    String getExitStatus();
    Date getCreateTime();
    Date getLastUpdatedTime();
    Properties getJobParameters();
}

BatchStatus is an enum:

STARTING, STARTED, STOPPING, STOPPED, FAILED, COMPLETED, ABANDONED

.

What is a Step

A Step encapsulates an independent phase of a batch job, containing its own ItemReader, ItemProcessor, and ItemWriter. Steps can be simple or complex depending on the business logic.

What is StepExecution

StepExecution records a single execution of a Step, including transaction counts, start/end times, and a reference to the parent JobExecution.

What is ExecutionContext

ExecutionContext stores key‑value pairs for a StepExecution or JobExecution, allowing data to be persisted between restarts.

ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();

What is JobRepository

JobRepository persists metadata about Jobs, Steps, JobExecutions, and StepExecutions, providing CRUD operations for these entities.

What is JobLauncher

JobLauncher launches a Job with given JobParameters, returning a JobExecution.

public interface JobLauncher {
    JobExecution run(Job job, JobParameters jobParameters)
        throws JobExecutionAlreadyRunningException, JobRestartException,
               JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}

What is ItemReader

ItemReader reads data for a Step. Spring Batch provides many implementations (e.g., JdbcPagingItemReader, JdbcCursorItemReader) that can read from databases, files, or streams.

@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
    Map<String, Object> parameterValues = new HashMap<>();
    parameterValues.put("status", "NEW");
    return new JdbcPagingItemReaderBuilder<CustomerCredit>()
            .name("creditReader")
            .dataSource(dataSource)
            .queryProvider(queryProvider)
            .parameterValues(parameterValues)
            .rowMapper(customerCreditMapper())
            .pageSize(1000)
            .build();
}

What is ItemWriter

ItemWriter writes processed data to a destination. It can write one record at a time or in chunks.

What is ItemProcessor

ItemProcessor applies business logic to each item between reading and writing. Returning null skips the item.

Chunk Processing

Spring Batch can process items in chunks. A chunk size determines how many items are read, processed, and written before a transaction commit.

@Bean
public Job footballJob() {
    return jobBuilderFactory.get("footballJob")
            .start(playerLoad())
            .next(gameLoad())
            .next(playerSummarization())
            .end()
            .build();
}

Skip Strategy and Failure Handling

Batch jobs can be configured to skip a limited number of exceptions using skipLimit(), skip(), and noSkip(). Exceptions not listed in noSkip() will cause the step to fail immediately.

Batch Processing Guidelines

Key principles for building robust batch solutions include simplifying job logic, keeping processing close to the data store, minimizing I/O, allocating sufficient memory upfront, validating data integrity, and performing early performance testing.

Design the architecture to suit the batch workload.

Avoid overly complex logic within a single batch application.

Keep data processing and storage physically close.

Minimize system resource usage, especially I/O.

Analyze SQL to avoid unnecessary scans and missing WHERE keys.

Perform data aggregation during the initial processing to avoid duplicate work.

Allocate enough memory at startup to prevent runtime reallocations.

Assume worst‑case data integrity and add validation checks.

Use checksums for internal verification.

Conduct stress testing with realistic data volumes.

Plan and test backup strategies for both databases and files.

Preventing Automatic Job Startup

To stop Spring Batch jobs from running automatically at application startup, set the following property in application.properties:

spring.batch.job.enabled=false

Handling Out‑of‑Memory Issues

If a batch job reads all records at once and exhausts heap memory, consider paging the reader or increasing the JVM heap size.

Resource exhaustion event: the JVM was unable to allocate memory from the heap.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaBatch ProcessingSpring FrameworkJobChunkSpring BatchStep
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.