Backend Development 20 min read

Master Spring Batch: Core Concepts, Architecture, and Best Practices

This article provides a comprehensive overview of Spring Batch, covering its purpose, architecture, core components such as Job, Step, ItemReader/Writer/Processor, execution contexts, chunk processing, skip strategies, and practical tips for configuration and memory management.

macrozheng
macrozheng
macrozheng
Master Spring Batch: Core Concepts, Architecture, and Best Practices

Spring Batch Overview

Spring Batch is a lightweight, comprehensive batch processing framework provided by Spring, designed for building robust batch applications essential to enterprise daily operations. It handles large‑scale data processing without user interaction, supports complex business rules, and integrates data from internal and external systems.

Spring Batch Architecture Overview

A typical batch application reads a large number of records from a database, file, or queue, processes the data, and writes the results back. The following diagram illustrates the overall flow:

The overall architecture of Spring Batch consists of Jobs composed of multiple Steps. Each Step can define its own

ItemReader

,

ItemProcessor

, and

ItemWriter

. Jobs are stored in a

JobRepository

and launched via a

JobLauncher

.

Core Concepts of Spring Batch

What is a Job

A Job represents the entire batch process and is the top‑level abstraction. It contains one or more Steps and can be configured with listeners, restart policies, and parameters.

<code>/**
 * Batch domain object representing a job. Job is an explicit abstraction
 * representing the configuration of a job specified by a developer.
 */
public interface Job {
    String getName();
    boolean isRestartable();
    void execute(JobExecution execution);
    JobParametersIncrementer getJobParametersIncrementer();
    JobParametersValidator getJobParametersValidator();
}
</code>

A simple implementation is

SimpleJob

, which provides default behavior.

<code>@Bean
public Job footballJob() {
    return this.jobBuilderFactory.get("footballJob")
        .start(playerLoad())
        .next(gameLoad())
        .next(playerSummarization())
        .end()
        .build();
}
</code>

What is a JobInstance

A

JobInstance

uniquely identifies a job definition with a specific set of parameters.

<code>public interface JobInstance {
    /** Get unique id for this JobInstance. */
    long getInstanceId();
    /** Get job name. */
    String getJobName();
}
</code>

What is a JobParameters

JobParameters

hold the values used to launch a job, allowing each execution to be distinguished (e.g., by date).

What is a JobExecution

A

JobExecution

represents a single attempt to run a job, containing status, start/end times, and the associated

JobParameters

.

<code>public interface JobExecution {
    long getExecutionId();
    String getJobName();
    BatchStatus getBatchStatus();
    Date getStartTime();
    Date getEndTime();
    String getExitStatus();
    Date getCreateTime();
    Date getLastUpdatedTime();
    Properties getJobParameters();
}
</code>

The batch status enum includes

STARTING, STARTED, STOPPING, STOPPED, FAILED, COMPLETED, ABANDONED

.

What is a Step

A Step encapsulates an independent phase of a batch job. Each Step can have its own reader, processor, and writer.

What is a StepExecution

A

StepExecution

records the runtime details of a Step, including its status, commit count, and timestamps.

What is an ExecutionContext

An

ExecutionContext

stores key‑value pairs for a Step or Job, enabling data sharing and restartability.

<code>ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
</code>

What is a JobRepository

The

JobRepository

persists Jobs, Steps, and their executions, providing CRUD operations for the batch infrastructure.

What is a JobLauncher

The

JobLauncher

starts a Job with given parameters.

<code>public interface JobLauncher {
    JobExecution run(Job job, JobParameters jobParameters)
        throws JobExecutionAlreadyRunningException, JobRestartException,
               JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}
</code>

What is an ItemReader

An

ItemReader

abstracts data input for a Step. Spring Batch offers many implementations such as

JdbcPagingItemReader

and

JdbcCursorItemReader

.

<code>@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
    Map<String, Object> parameterValues = new HashMap<>();
    parameterValues.put("status", "NEW");
    return new JdbcPagingItemReaderBuilder<CustomerCredit>()
        .name("creditReader")
        .dataSource(dataSource)
        .queryProvider(queryProvider)
        .parameterValues(parameterValues)
        .rowMapper(customerCreditMapper())
        .pageSize(1000)
        .build();
}
</code>

What is an ItemWriter

An

ItemWriter

abstracts data output. It can write one record at a time or a chunk of records.

What is an ItemProcessor

An

ItemProcessor

applies business logic between reading and writing; returning

null

skips the item.

Chunk Processing

Chunk processing groups a configurable number of items before committing them as a single transaction, improving performance.

Skip Strategy and Failure Handling

Skip policies allow a Step to ignore a limited number of exceptions.

skipLimit()

sets the maximum number of skips,

skip()

defines which exceptions can be skipped, and

noSkip()

excludes exceptions from being skipped.

Batch Processing Guidelines

Design the batch architecture to minimize complexity.

Keep data processing close to storage to reduce I/O.

Maximize in‑memory operations and limit unnecessary I/O.

Analyze SQL statements to avoid redundant scans and missing indexes.

Avoid duplicate work; aggregate data during the initial processing phase.

Allocate sufficient memory at startup to prevent runtime reallocations.

Assume worst‑case data integrity; add validation and checksums.

Conduct performance testing with realistic data volumes.

Plan and test backup strategies for both databases and files.

How to Prevent Job Auto‑Start

By default, Spring Batch runs all defined jobs on application startup. To disable this behavior, add the following property:

<code>spring.batch.job.enabled=false</code>

Out‑of‑Memory When Reading Data

If a job reads all records at once without paging, the JVM may run out of heap memory, resulting in a "Resource exhaustion event". Solutions include implementing a paging

ItemReader

or increasing the JVM heap size.

Javabackend developmentbatch processingJob SchedulingSpring BatchChunk Processing
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.