Why Spring Batch? Real‑World Scenarios, Core Architecture and Hands‑On Guide
This article explains the necessity of batch processing, presents typical use cases such as daily interest calculation, e‑commerce order archiving, log analysis and medical data migration, then dives deep into Spring Batch's core components, provides step‑by‑step code examples, performance‑tuning tips, production‑grade fault‑tolerance, monitoring solutions and a comprehensive FAQ.
1. Why Batch Processing?
1.1 Application Scenarios
Scenario 1: Daily interest calculation for banks
Pain point: Need to scan millions of accounts at midnight; manual calculation easily misses records.
Spring Batch solution: Partitioned reading, batch interest calculation, automatic retry on failure.
Actual case: Processing time reduced from 4 hours to 23 minutes.
Scenario 2: E‑commerce order archiving
<code>// Traditional SQL example (performance problem)
DELETE FROM active_orders
WHERE create_time < '2023-01-01'
LIMIT 5000; // repeat until no rows left</code>Problem: Deleting millions of rows locks the table.
Correct approach: Use Spring Batch to page‑read, write to history table, then batch delete.
Scenario 3: Log analysis
Typical demand: Analyze API response time distribution from Nginx logs.
Special challenge: Memory control when processing GB‑level text files.
2. Traditional Pain Points
Complex resource management.
Thread‑pool misconfiguration leads to OOM or connection leaks.
Fault‑tolerance black holes.
<code>// Typical multithreaded error example
ExecutorService executor = Executors.newFixedThreadPool(8);
try {
while (hasNextPage()) {
List<Data> page = fetchNextPage();
executor.submit(() -> processPage(page)); // may cause memory leak
}
} finally {
executor.shutdown(); // forgetting this causes thread pile‑up
}</code>3. Spring Batch Core Architecture
3.1 Four Core Components
Component 1: Job
Core role: Define a complete batch pipeline (e.g., monthly report generation).
Real case: A bank's end‑of‑day reconciliation Job contains three Steps.
<code>@Bean
public Job reconciliationJob() {
return jobBuilderFactory.get("dailyReconciliation")
.start(downloadBankFileStep())
.next(validateDataStep())
.next(generateReportStep())
.build();
}</code>Component 2: Step
Implements the chunk‑oriented processing model.
<code>@Bean
public Step csvProcessingStep() {
return stepBuilderFactory.get("csvProcessing")
.<User, User>chunk(500)
.reader(csvReader())
.processor(validationProcessor())
.writer(dbWriter())
.faultTolerant()
.skipLimit(10)
.skip(DataIntegrityViolationException.class)
.build();
}</code>Component 3: ItemReader
<code>@Bean
public FlatFileItemReader<User> csvReader() {
return new FlatFileItemReaderBuilder<User>()
.name("userReader")
.resource(new FileSystemResource("data/users.csv"))
.delimited().delimiter(",")
.names("id", "name", "email")
.fieldSetMapper(new BeanWrapperFieldSetMapper<User>() {{ setTargetType(User.class); }})
.linesToSkip(1)
.build();
}</code>Component 4: ItemWriter
<code>@Bean
public JdbcBatchItemWriter<User> userWriter(DataSource ds) {
return new JdbcBatchItemWriterBuilder<User>()
.dataSource(ds)
.sql("INSERT INTO users (name, age, email) VALUES (:name, :age, :email)")
.beanMapped()
.build();
}</code>4. Hands‑On Development Guide
4.1 Environment Setup (pom.xml excerpt)
<code><!-- Complete POM configuration -->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.1.5</version>
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
# application.properties
spring.batch.jdbc.initialize-schema=always
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver</code>4.2 First Batch Job
<code>@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Autowired private JobBuilderFactory jobBuilderFactory;
@Autowired private StepBuilderFactory stepBuilderFactory;
@Bean
public Job importUserJob() {
return jobBuilderFactory.get("importUserJob")
.start(csvProcessingStep())
.build();
}
@Bean
public Step csvProcessingStep() {
return stepBuilderFactory.get("csvProcessing")
.<User, User>chunk(100)
.reader(userReader())
.processor(userProcessor())
.writer(userWriter())
.build();
}
@Bean
public FlatFileItemReader<User> userReader() {
return new FlatFileItemReaderBuilder<User>()
.name("userReader")
.resource(new ClassPathResource("users.csv"))
.delimited().delimiter(",")
.names("name", "age", "email")
.fieldSetMapper(new BeanWrapperFieldSetMapper<User>() {{ setTargetType(User.class); }})
.linesToSkip(1)
.build();
}
@Bean
public ItemProcessor<User, User> userProcessor() {
return user -> {
if (user.getAge() < 0) {
throw new IllegalArgumentException("年龄不能为负数: " + user);
}
return user.toBuilder()
.email(user.getEmail().toLowerCase())
.build();
};
}
@Bean
public JdbcBatchItemWriter<User> userWriter(DataSource ds) {
return new JdbcBatchItemWriterBuilder<User>()
.dataSource(ds)
.sql("INSERT INTO users (name, age, email) VALUES (:name, :age, :email)")
.beanMapped()
.build();
}
}</code>5. Real‑World Case: Bank Transaction Reconciliation
5.1 Requirements
Dual data source: file from bank + internal database.
Millions of records need efficient matching.
Differences must be persisted quickly.
Run in a distributed environment.
5.2 Architecture Overview
5.3 Domain Model
<code>@Data @AllArgsConstructor @NoArgsConstructor
public class Transaction {
private String transactionId;
private LocalDateTime tradeTime;
private BigDecimal amount;
private String bankSerialNo;
private BigDecimal bankAmount;
private String internalOrderNo;
private BigDecimal systemAmount;
private ReconStatus status;
private String discrepancyType;
}
enum ReconStatus { MATCHED, AMOUNT_DIFF, STATUS_DIFF, ONLY_IN_BANK, ONLY_IN_SYSTEM }
</code>5.4 Job Definition
<code>@Configuration
@EnableBatchProcessing
public class BankReconJobConfig {
@Bean
public Job bankReconciliationJob(Step downloadStep, Step reconStep, Step reportStep) {
return jobBuilderFactory.get("bankReconciliationJob")
.start(downloadStep)
.next(reconStep)
.next(reportStep)
.build();
}
// downloadStep, reconStep, reportStep definitions omitted for brevity – they follow the same pattern of reader → processor → writer with fault‑tolerance, skip and retry settings.
}
</code>5.5 Core Processors
<code>public class DataMatchingProcessor implements ItemProcessor<Transaction, Transaction> {
@Override
public Transaction process(Transaction item) {
if (item.getBankSerialNo() == null) {
item.setStatus(ReconStatus.ONLY_IN_SYSTEM);
} else if (item.getInternalOrderNo() == null) {
item.setStatus(ReconStatus.ONLY_IN_BANK);
} else {
compareAmounts(item);
compareStatuses(item);
}
return item;
}
// compareAmounts and compareStatuses implementations omitted for brevity
}
public class DiscrepancyClassifier implements ItemProcessor<Transaction, Transaction> {
@Override
public Transaction process(Transaction item) {
if (item.getStatus() != ReconStatus.MATCHED) {
item.setAlertLevel(calculateAlertLevel(item));
}
return item;
}
// calculateAlertLevel implementation omitted for brevity
}
</code>6. Production‑Grade Features
6.1 Fault Tolerance
<code>@Bean
public Step secureStep() {
return stepBuilderFactory.get("secureStep")
.<Input, Output>chunk(500)
.reader(jdbcReader())
.processor(secureProcessor())
.writer(restApiWriter())
.faultTolerant()
.retryLimit(3)
.retry(ConnectException.class)
.retry(DeadlockLoserDataAccessException.class)
.skipLimit(100)
.skip(DataIntegrityViolationException.class)
.skip(InvalidDataAccessApiUsageException.class)
.noRollback(ValidationException.class)
.listener(new ErrorLogListener())
.build();
}
</code>6.2 Performance Optimisation
Chunk size tuned to memory (e.g., 2000).
HikariCP pool size increased.
JPA batch size set to 1000.
Parallel steps, partitioning and asynchronous processors demonstrated.
6.3 Monitoring & Management
Integrate Micrometer with Prometheus and expose custom metrics for processed records, step duration, and error counts.
<code>@Bean
public MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
return registry -> registry.config().commonTags("application", "batch-service");
}
public class BatchMetricsListener extends JobExecutionListenerSupport {
private final Counter processedRecords = Counter.builder("batch.records.processed")
.description("Total processed records")
.register(Metrics.globalRegistry);
@Override
public void afterStep(StepExecution stepExecution) {
processedRecords.increment(stepExecution.getWriteCount());
}
}
</code>7. FAQ & Troubleshooting
7.1 Restarting a Failed Job
<code>// Query failed executions
SELECT * FROM BATCH_JOB_EXECUTION WHERE STATUS = 'FAILED';
// Relaunch with original parameters
JobParameters params = new JobParametersBuilder()
.addLong("restartId", originalExecutionId)
.toJobParameters();
jobLauncher.run(job, params);
</code>7.2 Handling Power Failure
Spring Batch persists step execution context in BATCH_STEP_EXECUTION_CONTEXT , allowing a job to resume from the last committed chunk.
7.3 Passing Dynamic Parameters
<code>// Command line
java -jar batch.jar --spring.batch.job.name=dataImportJob date=2023-10-01
// Programmatic launch
JobParameters params = new JobParametersBuilder()
.addString("mode", "forceUpdate")
.addLong("timestamp", System.currentTimeMillis())
.toJobParameters();
jobLauncher.run(importJob, params);
</code>7.4 Performance Checklist
Database indexes for batch‑read/write columns.
HikariCP max‑pool‑size tuned to workload.
JVM flags: -XX:+UseStringDeduplication -XX:+UseCompressedOops -XX:MaxMetaspaceSize=512m -Xmx4g -XX:+UseG1GC -XX:MaxGCPauseMillis=200
Spring Batch chunk size adjusted to available memory.
— END —
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.