Backend Development 20 min read

Master Spring Batch: From Core Concepts to Advanced Chunk Processing

This article provides a comprehensive introduction to Spring Batch, covering its purpose, core architecture, key concepts such as Job, Step, ItemReader/Processor/Writer, chunk processing, skip handling, best practices, and common pitfalls like memory exhaustion, all illustrated with code examples and diagrams.

Programmer DD

Jul 22, 2022

Master Spring Batch: From Core Concepts to Advanced Chunk Processing

Spring Batch Overview

Spring Batch is a data processing framework provided by Spring for enterprise batch jobs that need to handle large volumes of data without user interaction. Typical use cases include time‑based events, periodic processing of massive datasets, and integration of data from internal or external systems in a transactional manner.

Spring Batch is a lightweight, comprehensive batch framework that builds on Spring’s productivity and POJO‑based development model while allowing access to advanced enterprise services. It is not a scheduling framework.

Key reusable features include record tracking, transaction management, job statistics, job restart, skip logic, and resource management, as well as high‑throughput capabilities through optimization and partitioning.

It can be used for simple scenarios such as reading a file into a database or invoking a stored procedure, as well as complex large‑scale data migrations between databases.

Spring Batch Architecture

A typical batch application reads a large number of records from a database, file, or queue, processes the data, and writes the results back.

The overall architecture consists of jobs composed of steps. Each step can define its own ItemReader, ItemProcessor, and ItemWriter. Jobs are launched via a JobLauncher and persisted in a JobRepository.

Core Concepts

Job

A Job is the top‑level abstraction that encapsulates an entire batch process. The interface defines methods such as getName(), isRestartable(), execute(JobExecution), getJobParametersIncrementer(), and getJobParametersValidator(). Implementations include SimpleJob and FlowJob. A job contains one or more Step objects.

public interface Job {
    String getName();
    boolean isRestartable();
    void execute(JobExecution execution);
    JobParametersIncrementer getJobParametersIncrementer();
    JobParametersValidator getJobParametersValidator();
}

Jobs are associated with JobInstance and JobExecution objects that represent the logical instance and a single run, respectively.

JobInstance

A JobInstance represents a logical execution of a job with a unique identifier and job name.

public interface JobInstance {
    long getInstanceId();
    String getJobName();
}

JobParameters

JobParameters

hold a set of parameters used to start a job and to distinguish different instances of the same job, such as a processing date.

JobExecution

JobExecution

represents a single attempt to run a job. It provides execution ID, job name, batch status, start/end times, exit status, and the associated JobParameters.

public interface JobExecution {
    long getExecutionId();
    String getJobName();
    BatchStatus getBatchStatus();
    Date getStartTime();
    Date getEndTime();
    String getExitStatus();
    Date getCreateTime();
    Date getLastUpdatedTime();
    Properties getJobParameters();
}

Step and StepExecution

A Step encapsulates an independent phase of a job. Each step has a corresponding StepExecution that records its runtime context, including commit counts and timestamps.

ExecutionContext

ExecutionContext

is a key‑value store attached to a StepExecution or JobExecution for persisting state between restarts.

JobRepository and JobLauncher

JobRepository

persists jobs, steps, and execution metadata. JobLauncher launches a job with given JobParameters.

public interface JobLauncher {
    JobExecution run(Job job, JobParameters jobParameters)
        throws JobExecutionAlreadyRunningException, JobRestartException,
               JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}

ItemReader, ItemProcessor, ItemWriter

ItemReader

reads input data for a step; ItemProcessor applies business logic; ItemWriter writes the processed output. Spring Batch provides many implementations such as JdbcPagingItemReader, JdbcCursorItemReader, etc.

@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
    Map<String, Object> params = new HashMap<>();
    params.put("status", "NEW");
    return new JdbcPagingItemReaderBuilder<CustomerCredit>()
            .name("creditReader")
            .dataSource(dataSource)
            .queryProvider(queryProvider)
            .parameterValues(params)
            .rowMapper(customerCreditMapper())
            .pageSize(1000)
            .build();
}

Chunk Processing

Spring Batch can group items into chunks. When the number of processed items reaches the configured chunk size, the transaction is committed, improving performance over committing each item individually.

Example step configuration sets chunk size to 10, causing the ItemWriter to be invoked after every ten items.

Skip and Failure Handling

Batch steps can be configured with skipLimit(), skip(), and noSkip() to control how many exceptions may be ignored and which exceptions are fatal.

Note: If skipLimit is not set, the default is 0.

Best Practices for Batch Jobs

Design the batch architecture to minimize complexity.

Keep processing and storage physically close.

Minimize I/O and maximize in‑memory operations.

Avoid redundant work; aggregate data during the initial processing.

Allocate sufficient memory at startup to prevent frequent reallocations.

Assume worst‑case data integrity and add validation checks.

Perform checksum validation for internal consistency.

Conduct performance testing with realistic data volumes.

Plan and test backup strategies for both databases and files.

Disabling Automatic Job Startup

To prevent Spring Batch jobs from running automatically at application startup, set the property:

spring.batch.job.enabled=false

Memory Exhaustion Issue

If a job reads all records into memory at once, the JVM may run out of heap space. Solutions include paging the reader or increasing the service memory.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Batch Processing Chunk Spring Batch Skip Strategy

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.