Master Spring Batch: Core Concepts, Architecture, and Best Practices
This article provides a comprehensive guide to Spring Batch, covering its purpose, architecture, core components such as Job, Step, ItemReader/Writer/Processor, chunk processing, skip strategies, and practical tips for configuring and optimizing batch jobs in Java applications.
Spring Batch Overview
Spring Batch is a lightweight, comprehensive batch‑processing framework provided by the Spring ecosystem. It is designed for enterprise applications that need to process large volumes of data without user interaction, handling tasks such as end‑of‑month calculations, insurance benefit determinations, and massive daily transaction processing.
Spring Batch Architecture
A typical batch job reads a large number of records from a database, file, or queue, processes the data, and writes the transformed records back to a destination. The following diagram illustrates this flow:
The overall Spring Batch architecture consists of Jobs, Steps, and supporting components such as JobRepository and JobLauncher.
Core Concepts of Spring Batch
What is a Job
A Job represents the entire batch process and is the top‑level abstraction. It defines a sequence of Steps and can be configured with listeners, restart policies, and parameters.
/**
* Batch domain object representing a job. Job is an explicit abstraction
* representing the configuration of a job specified by a developer. It should
* be noted that restart policy is applied to the job as a whole and not to a
* step.
*/
public interface Job {
String getName();
boolean isRestartable();
void execute(JobExecution execution);
JobParametersIncrementer getJobParametersIncrementer();
JobParametersValidator getJobParametersValidator();
}Jobs are implemented mainly by SimpleJob and FlowJob. A Job contains one or more Steps.
What is a JobInstance
A JobInstance is a unique execution of a Job with a specific set of parameters. It provides methods to obtain the instance ID and the job name.
public interface JobInstance {
/** Get unique id for this JobInstance. */
long getInstanceId();
/** Get job name. */
String getJobName();
}Each logical run (e.g., an end‑of‑day job) creates a distinct JobInstance.
What is JobParameters
JobParameters are a set of key‑value pairs that uniquely identify a JobInstance. They are typically used to pass values such as the execution date.
What is JobExecution
JobExecution represents a single attempt to run a Job. It records status, start/end times, exit status, and the associated JobParameters.
public interface JobExecution {
long getExecutionId();
String getJobName();
BatchStatus getBatchStatus();
Date getStartTime();
Date getEndTime();
String getExitStatus();
Date getCreateTime();
Date getLastUpdatedTime();
Properties getJobParameters();
}BatchStatus is an enum:
STARTING, STARTED, STOPPING, STOPPED, FAILED, COMPLETED, ABANDONED.
What is a Step
A Step encapsulates an independent phase of a batch job, containing its own ItemReader, ItemProcessor, and ItemWriter. Steps can be simple or complex depending on the business logic.
What is StepExecution
StepExecution records a single execution of a Step, including transaction counts, start/end times, and a reference to the parent JobExecution.
What is ExecutionContext
ExecutionContext stores key‑value pairs for a StepExecution or JobExecution, allowing data to be persisted between restarts.
ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();What is JobRepository
JobRepository persists metadata about Jobs, Steps, JobExecutions, and StepExecutions, providing CRUD operations for these entities.
What is JobLauncher
JobLauncher launches a Job with given JobParameters, returning a JobExecution.
public interface JobLauncher {
JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}What is ItemReader
ItemReader reads data for a Step. Spring Batch provides many implementations (e.g., JdbcPagingItemReader, JdbcCursorItemReader) that can read from databases, files, or streams.
@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("status", "NEW");
return new JdbcPagingItemReaderBuilder<CustomerCredit>()
.name("creditReader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.parameterValues(parameterValues)
.rowMapper(customerCreditMapper())
.pageSize(1000)
.build();
}What is ItemWriter
ItemWriter writes processed data to a destination. It can write one record at a time or in chunks.
What is ItemProcessor
ItemProcessor applies business logic to each item between reading and writing. Returning null skips the item.
Chunk Processing
Spring Batch can process items in chunks. A chunk size determines how many items are read, processed, and written before a transaction commit.
@Bean
public Job footballJob() {
return jobBuilderFactory.get("footballJob")
.start(playerLoad())
.next(gameLoad())
.next(playerSummarization())
.end()
.build();
}Skip Strategy and Failure Handling
Batch jobs can be configured to skip a limited number of exceptions using skipLimit(), skip(), and noSkip(). Exceptions not listed in noSkip() will cause the step to fail immediately.
Batch Processing Guidelines
Key principles for building robust batch solutions include simplifying job logic, keeping processing close to the data store, minimizing I/O, allocating sufficient memory upfront, validating data integrity, and performing early performance testing.
Design the architecture to suit the batch workload.
Avoid overly complex logic within a single batch application.
Keep data processing and storage physically close.
Minimize system resource usage, especially I/O.
Analyze SQL to avoid unnecessary scans and missing WHERE keys.
Perform data aggregation during the initial processing to avoid duplicate work.
Allocate enough memory at startup to prevent runtime reallocations.
Assume worst‑case data integrity and add validation checks.
Use checksums for internal verification.
Conduct stress testing with realistic data volumes.
Plan and test backup strategies for both databases and files.
Preventing Automatic Job Startup
To stop Spring Batch jobs from running automatically at application startup, set the following property in application.properties:
spring.batch.job.enabled=falseHandling Out‑of‑Memory Issues
If a batch job reads all records at once and exhausts heap memory, consider paging the reader or increasing the JVM heap size.
Resource exhaustion event: the JVM was unable to allocate memory from the heap.Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
