Master Spring Batch: From Core Concepts to Advanced Chunk Processing
This article provides a comprehensive introduction to Spring Batch, covering its purpose, core architecture, key concepts such as Job, Step, ItemReader/Processor/Writer, chunk processing, skip handling, best practices, and common pitfalls like memory exhaustion, all illustrated with code examples and diagrams.
Spring Batch Overview
Spring Batch is a data processing framework provided by Spring for enterprise batch jobs that need to handle large volumes of data without user interaction. Typical use cases include time‑based events, periodic processing of massive datasets, and integration of data from internal or external systems in a transactional manner.
Spring Batch is a lightweight, comprehensive batch framework that builds on Spring’s productivity and POJO‑based development model while allowing access to advanced enterprise services. It is not a scheduling framework.
Key reusable features include record tracking, transaction management, job statistics, job restart, skip logic, and resource management, as well as high‑throughput capabilities through optimization and partitioning.
It can be used for simple scenarios such as reading a file into a database or invoking a stored procedure, as well as complex large‑scale data migrations between databases.
Spring Batch Architecture
A typical batch application reads a large number of records from a database, file, or queue, processes the data, and writes the results back.
The overall architecture consists of jobs composed of steps. Each step can define its own ItemReader, ItemProcessor, and ItemWriter. Jobs are launched via a JobLauncher and persisted in a JobRepository.
Core Concepts
Job
A Job is the top‑level abstraction that encapsulates an entire batch process. The interface defines methods such as getName(), isRestartable(), execute(JobExecution), getJobParametersIncrementer(), and getJobParametersValidator(). Implementations include SimpleJob and FlowJob. A job contains one or more Step objects.
public interface Job {
String getName();
boolean isRestartable();
void execute(JobExecution execution);
JobParametersIncrementer getJobParametersIncrementer();
JobParametersValidator getJobParametersValidator();
}Jobs are associated with JobInstance and JobExecution objects that represent the logical instance and a single run, respectively.
JobInstance
A JobInstance represents a logical execution of a job with a unique identifier and job name.
public interface JobInstance {
long getInstanceId();
String getJobName();
}JobParameters
JobParametershold a set of parameters used to start a job and to distinguish different instances of the same job, such as a processing date.
JobExecution
JobExecutionrepresents a single attempt to run a job. It provides execution ID, job name, batch status, start/end times, exit status, and the associated JobParameters.
public interface JobExecution {
long getExecutionId();
String getJobName();
BatchStatus getBatchStatus();
Date getStartTime();
Date getEndTime();
String getExitStatus();
Date getCreateTime();
Date getLastUpdatedTime();
Properties getJobParameters();
}Step and StepExecution
A Step encapsulates an independent phase of a job. Each step has a corresponding StepExecution that records its runtime context, including commit counts and timestamps.
ExecutionContext
ExecutionContextis a key‑value store attached to a StepExecution or JobExecution for persisting state between restarts.
JobRepository and JobLauncher
JobRepositorypersists jobs, steps, and execution metadata. JobLauncher launches a job with given JobParameters.
public interface JobLauncher {
JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}ItemReader, ItemProcessor, ItemWriter
ItemReaderreads input data for a step; ItemProcessor applies business logic; ItemWriter writes the processed output. Spring Batch provides many implementations such as JdbcPagingItemReader, JdbcCursorItemReader, etc.
@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
Map<String, Object> params = new HashMap<>();
params.put("status", "NEW");
return new JdbcPagingItemReaderBuilder<CustomerCredit>()
.name("creditReader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.parameterValues(params)
.rowMapper(customerCreditMapper())
.pageSize(1000)
.build();
}Chunk Processing
Spring Batch can group items into chunks. When the number of processed items reaches the configured chunk size, the transaction is committed, improving performance over committing each item individually.
Example step configuration sets chunk size to 10, causing the ItemWriter to be invoked after every ten items.
Skip and Failure Handling
Batch steps can be configured with skipLimit(), skip(), and noSkip() to control how many exceptions may be ignored and which exceptions are fatal.
Note: If skipLimit is not set, the default is 0.
Best Practices for Batch Jobs
Design the batch architecture to minimize complexity.
Keep processing and storage physically close.
Minimize I/O and maximize in‑memory operations.
Avoid redundant work; aggregate data during the initial processing.
Allocate sufficient memory at startup to prevent frequent reallocations.
Assume worst‑case data integrity and add validation checks.
Perform checksum validation for internal consistency.
Conduct performance testing with realistic data volumes.
Plan and test backup strategies for both databases and files.
Disabling Automatic Job Startup
To prevent Spring Batch jobs from running automatically at application startup, set the property:
spring.batch.job.enabled=falseMemory Exhaustion Issue
If a job reads all records into memory at once, the JVM may run out of heap space. Solutions include paging the reader or increasing the service memory.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
