Spring Batch Basics: Building Efficient SpringBoot Batch Jobs
This article explains why naive for‑loop DB operations fail on large data sets, introduces Spring Batch’s chunk, transaction, retry, skip and restart features, and provides step‑by‑step SpringBoot configurations, code samples for Tasklet and Chunk jobs, database and CSV readers/writers, and manual or scheduled job triggering.
Problems with naive for‑loop batch implementations
Processing millions of rows with a single for loop and immediate DB writes leads to:
Out‑of‑memory (OOM) errors.
Very high I/O because each record is written individually.
All data lost on a mid‑process exception (no checkpoint).
No retry or skip mechanism, causing task failure on dirty data.
No task status persistence, making monitoring and reruns impossible.
Spring Batch core concepts
Batch refers to large‑volume, offline, non‑real‑time, repeatable data tasks such as nightly reconciliation, CSV/Excel import‑export, data migration, archiving, and bulk messaging.
Spring Batch provides a four‑layer model: Job → Step → Execution logic → Context . A Job contains one or more sequential Steps . Steps can be implemented as:
Tasklet : a single action for simple jobs (e.g., table truncation).
Chunk : the read‑process‑write pattern for massive data handling.
Chunk processing follows a three‑phase flow: ItemReader → ItemProcessor → ItemWriter . Each chunk reads N items, processes them, writes them in batch, and commits a single transaction, preventing OOM and enabling atomic commits.
Spring Batch automatically creates metadata tables (≈10) to record job name, batch number, status, start/end time, progress, and failure point, which enables restart, idempotency, and monitoring.
Environment setup & configuration
Maven core dependencies
<!-- Spring Boot core -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<!-- Spring Batch core -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<!-- JDBC for metadata persistence -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<!-- MySQL driver (runtime) -->
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Druid connection pool -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid-spring-boot-starter</artifactId>
<version>1.2.16</version>
</dependency>
<!-- CSV utilities -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.10.0</version>
</dependency>
<!-- Lombok (optional) -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>application.yml configuration
spring:
datasource:
type: com.alibaba.druid.pool.DruidDataSource
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/batch_db?useUnicode=true&serverTimezone=Asia/Shanghai&allowMultiQueries=true
username: root
password: root
batch:
job:
enabled: false # disable auto‑run, allow manual or scheduled launch
initialize-schema: always
jdbc:
initialize-schema: alwaysEnable batch processing in the main class
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
@EnableBatchProcessing
public class BatchApplication {
public static void main(String[] args) {
SpringApplication.run(BatchApplication.class, args);
}
}Tasklet – simple one‑step job
Custom Tasklet
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.stereotype.Component;
@Component
public class CleanTempDataTask implements Tasklet {
@Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
System.out.println("[Tasklet] Execute temporary data cleanup and cache refresh");
// business logic: truncate temp tables, delete expired data, refresh configs, etc.
return RepeatStatus.FINISHED;
}
}Job and Step assembly
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.PlatformTransactionManager;
@Configuration
public class TaskletJobConfig {
@Autowired
private CleanTempDataTask cleanTempDataTask;
@Bean
public Step cleanDataStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("clean-data-step", jobRepository)
.tasklet(cleanTempDataTask, transactionManager)
.build();
}
@Bean
public Job cleanDataJob(JobRepository jobRepository, Step cleanDataStep) {
return new JobBuilder("clean-data-job", jobRepository)
.start(cleanDataStep)
.build();
}
}Chunk – standard batch processing
Domain model
CREATE TABLE user_info (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(32),
age INT,
email VARCHAR(64),
status TINYINT
); import lombok.Data;
@Data
public class UserInfo {
private Long id;
private String username;
private Integer age;
private String email;
private Integer status;
}ItemReader – in‑memory example
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.ArrayList;
import java.util.List;
@Configuration
public class UserReaderConfig {
@Bean
public ItemReader<UserInfo> userInfoReader() {
List<UserInfo> list = new ArrayList<>();
for (int i = 1; i <= 50; i++) {
UserInfo user = new UserInfo();
user.setUsername("user_" + i);
user.setAge(20 + i % 10);
user.setEmail("user" + i + "@qq.com");
user.setStatus(1);
list.add(user);
}
return new ListItemReader<>(list);
}
}ItemProcessor – filtering & transformation
import org.springframework.batch.item.ItemProcessor;
import org.springframework.stereotype.Component;
@Component
public class UserInfoProcessor implements ItemProcessor<UserInfo, UserInfo> {
@Override
public UserInfo process(UserInfo user) {
// Filter: discard records with age > 25
if (user.getAge() > 25) {
return null;
}
// Transform
user.setUsername(user.getUsername().toUpperCase());
user.setStatus(2);
return user;
}
}ItemWriter – console output (placeholder for DB batch write)
import org.springframework.batch.item.ItemWriter;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
public class UserInfoWriter implements ItemWriter<UserInfo> {
@Override
public void write(List<? extends UserInfo> items) {
System.out.println("[Batch write] Items count: " + items.size());
items.forEach(System.out::println);
// Real implementation would batch insert/update the DB
}
}Chunk job assembly
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.PlatformTransactionManager;
@Configuration
public class UserChunkJobConfig {
@Autowired
private ItemReader<UserInfo> userInfoReader;
@Autowired
private UserInfoProcessor userInfoProcessor;
@Autowired
private UserInfoWriter userInfoWriter;
@Bean
public Step userChunkStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("user-chunk-step", jobRepository)
.<UserInfo, UserInfo>chunk(10, transactionManager)
.reader(userInfoReader)
.processor(userInfoProcessor)
.writer(userInfoWriter)
.faultTolerant()
.retry(Exception.class).retryLimit(3)
.skip(Exception.class).skipLimit(100)
.build();
}
@Bean
public Job userChunkJob(JobRepository jobRepository, Step userChunkStep) {
return new JobBuilder("user-chunk-job", jobRepository)
.start(userChunkStep)
.build();
}
}chunk(10) means each transaction processes ten records, providing high throughput while keeping memory usage low.
Fault tolerance – retry, skip, and restart
Exception retry configuration
.faultTolerant()
.retry(Exception.class)
.retryLimit(3)Skip dirty data configuration
.skip(Exception.class)
.skipLimit(100)Complete fault‑tolerant step definition
@Bean
public Step userChunkStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("user-chunk-step", jobRepository)
.<UserInfo, UserInfo>chunk(10, transactionManager)
.reader(userInfoReader)
.processor(userInfoProcessor)
.writer(userInfoWriter)
.faultTolerant()
.retry(Exception.class).retryLimit(3)
.skip(Exception.class).skipLimit(100)
.build();
}Spring Batch persists the execution point of each chunk; on failure the job restarts from the failed chunk instead of from the beginning, improving stability for massive tasks.
Database batch read/write
JdbcCursorItemReader – streamed DB read
@Bean
public ItemReader<UserInfo> dbUserReader(DataSource dataSource) {
String sql = "select id,username,age,email,status from user_info";
return new JdbcCursorItemReaderBuilder<UserInfo>()
.dataSource(dataSource)
.sql(sql)
.rowMapper((rs, rowNum) -> {
UserInfo user = new UserInfo();
user.setId(rs.getLong("id"));
user.setUsername(rs.getString("username"));
user.setAge(rs.getInt("age"));
user.setEmail(rs.getString("email"));
user.setStatus(rs.getInt("status"));
return user;
})
.name("user-db-reader")
.build();
}JdbcBatchItemWriter – bulk DB write
@Bean
public ItemWriter<UserInfo> dbUserWriter(DataSource dataSource) {
String sql = "insert into user_info(username,age,email,status) values (?,?,?,?)";
return new JdbcBatchItemWriterBuilder<UserInfo>()
.dataSource(dataSource)
.sql(sql)
.itemPreparedStatementSetter((item, ps) -> {
ps.setString(1, item.getUsername());
ps.setInt(2, item.getAge());
ps.setString(3, item.getEmail());
ps.setInt(4, item.getStatus());
})
.build();
}CSV file import/export
CsvItemReader
@Bean
public FlatFileItemReader<UserInfo> csvReader() {
return new FlatFileItemReaderBuilder<UserInfo>()
.resource(new FileSystemResource("data/user.csv"))
.delimited()
.names("username", "age", "email")
.lineMapper(new DefaultLineMapper<>())
.fieldSetMapper(fieldSet -> {
UserInfo user = new UserInfo();
user.setUsername(fieldSet.readString("username"));
user.setAge(fieldSet.readInt("age"));
user.setEmail(fieldSet.readString("email"));
return user;
})
.build();
}CsvItemWriter
@Bean
public FlatFileItemWriter<UserInfo> csvWriter() {
return new FlatFileItemWriterBuilder<UserInfo>()
.resource(new FileSystemResource("output/user_out.csv"))
.delimited()
.names("id", "username", "age", "email", "status")
.build();
}Job triggering methods
Manual REST endpoint
@RestController
@RequestMapping("/batch")
public class BatchController {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job userChunkJob;
@GetMapping("/run")
public String runBatch() throws Exception {
JobParameters params = new JobParametersBuilder()
.addLong("time", System.currentTimeMillis())
.toJobParameters();
jobLauncher.run(userChunkJob, params);
return "Job executed successfully";
}
}Scheduled execution
@Component
@EnableScheduling
public class BatchSchedule {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job userChunkJob;
// Executes daily at 02:00
@Scheduled(cron = "0 0 2 * * ?")
public void scheduleRun() throws Exception {
JobParameters params = new JobParametersBuilder()
.addLong("scheduleTime", System.currentTimeMillis())
.toJobParameters();
jobLauncher.run(userChunkJob, params);
}
}Adding a time‑based parameter guarantees a unique job instance for each launch, allowing repeated executions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Tech Workshop
Focused on Java backend technologies, sharing fundamentals, multithreading, JVM, the Spring ecosystem, microservices, distributed systems, high concurrency, source‑code analysis, and practical experience. Continuously delivers high‑quality original content, interview guides, and learning roadmaps to help Java developers progress from beginner to advanced, enhancing technical skills and core competitiveness.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
