Master Spring Batch: Core Concepts, Architecture, and Best Practices
This article provides a comprehensive overview of Spring Batch, covering its purpose, architecture, core components such as Job, Step, ItemReader/Writer/Processor, execution contexts, chunk processing, skip strategies, and practical tips for configuration and memory management.
Spring Batch Overview
Spring Batch is a lightweight, comprehensive batch processing framework provided by Spring, designed for building robust batch applications essential to enterprise daily operations. It handles large‑scale data processing without user interaction, supports complex business rules, and integrates data from internal and external systems.
Spring Batch Architecture Overview
A typical batch application reads a large number of records from a database, file, or queue, processes the data, and writes the results back. The following diagram illustrates the overall flow:
The overall architecture of Spring Batch consists of Jobs composed of multiple Steps. Each Step can define its own
ItemReader,
ItemProcessor, and
ItemWriter. Jobs are stored in a
JobRepositoryand launched via a
JobLauncher.
Core Concepts of Spring Batch
What is a Job
A Job represents the entire batch process and is the top‑level abstraction. It contains one or more Steps and can be configured with listeners, restart policies, and parameters.
<code>/**
* Batch domain object representing a job. Job is an explicit abstraction
* representing the configuration of a job specified by a developer.
*/
public interface Job {
String getName();
boolean isRestartable();
void execute(JobExecution execution);
JobParametersIncrementer getJobParametersIncrementer();
JobParametersValidator getJobParametersValidator();
}
</code>A simple implementation is
SimpleJob, which provides default behavior.
<code>@Bean
public Job footballJob() {
return this.jobBuilderFactory.get("footballJob")
.start(playerLoad())
.next(gameLoad())
.next(playerSummarization())
.end()
.build();
}
</code>What is a JobInstance
A
JobInstanceuniquely identifies a job definition with a specific set of parameters.
<code>public interface JobInstance {
/** Get unique id for this JobInstance. */
long getInstanceId();
/** Get job name. */
String getJobName();
}
</code>What is a JobParameters
JobParametershold the values used to launch a job, allowing each execution to be distinguished (e.g., by date).
What is a JobExecution
A
JobExecutionrepresents a single attempt to run a job, containing status, start/end times, and the associated
JobParameters.
<code>public interface JobExecution {
long getExecutionId();
String getJobName();
BatchStatus getBatchStatus();
Date getStartTime();
Date getEndTime();
String getExitStatus();
Date getCreateTime();
Date getLastUpdatedTime();
Properties getJobParameters();
}
</code>The batch status enum includes
STARTING, STARTED, STOPPING, STOPPED, FAILED, COMPLETED, ABANDONED.
What is a Step
A Step encapsulates an independent phase of a batch job. Each Step can have its own reader, processor, and writer.
What is a StepExecution
A
StepExecutionrecords the runtime details of a Step, including its status, commit count, and timestamps.
What is an ExecutionContext
An
ExecutionContextstores key‑value pairs for a Step or Job, enabling data sharing and restartability.
<code>ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();
</code>What is a JobRepository
The
JobRepositorypersists Jobs, Steps, and their executions, providing CRUD operations for the batch infrastructure.
What is a JobLauncher
The
JobLauncherstarts a Job with given parameters.
<code>public interface JobLauncher {
JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}
</code>What is an ItemReader
An
ItemReaderabstracts data input for a Step. Spring Batch offers many implementations such as
JdbcPagingItemReaderand
JdbcCursorItemReader.
<code>@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("status", "NEW");
return new JdbcPagingItemReaderBuilder<CustomerCredit>()
.name("creditReader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.parameterValues(parameterValues)
.rowMapper(customerCreditMapper())
.pageSize(1000)
.build();
}
</code>What is an ItemWriter
An
ItemWriterabstracts data output. It can write one record at a time or a chunk of records.
What is an ItemProcessor
An
ItemProcessorapplies business logic between reading and writing; returning
nullskips the item.
Chunk Processing
Chunk processing groups a configurable number of items before committing them as a single transaction, improving performance.
Skip Strategy and Failure Handling
Skip policies allow a Step to ignore a limited number of exceptions.
skipLimit()sets the maximum number of skips,
skip()defines which exceptions can be skipped, and
noSkip()excludes exceptions from being skipped.
Batch Processing Guidelines
Design the batch architecture to minimize complexity.
Keep data processing close to storage to reduce I/O.
Maximize in‑memory operations and limit unnecessary I/O.
Analyze SQL statements to avoid redundant scans and missing indexes.
Avoid duplicate work; aggregate data during the initial processing phase.
Allocate sufficient memory at startup to prevent runtime reallocations.
Assume worst‑case data integrity; add validation and checksums.
Conduct performance testing with realistic data volumes.
Plan and test backup strategies for both databases and files.
How to Prevent Job Auto‑Start
By default, Spring Batch runs all defined jobs on application startup. To disable this behavior, add the following property:
<code>spring.batch.job.enabled=false</code>Out‑of‑Memory When Reading Data
If a job reads all records at once without paging, the JVM may run out of heap memory, resulting in a "Resource exhaustion event". Solutions include implementing a paging
ItemReaderor increasing the JVM heap size.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.