Backend Development 20 min read

Why Spring Batch? Real‑World Scenarios, Core Architecture and Hands‑On Guide

This article explains the necessity of batch processing, presents typical use cases such as daily interest calculation, e‑commerce order archiving, log analysis and medical data migration, then dives deep into Spring Batch's core components, provides step‑by‑step code examples, performance‑tuning tips, production‑grade fault‑tolerance, monitoring solutions and a comprehensive FAQ.

Java Captain

Jun 10, 2025

Why Spring Batch? Real‑World Scenarios, Core Architecture and Hands‑On Guide

1. Why Batch Processing?

1.1 Application Scenarios

Scenario 1: Daily interest calculation for banks

Pain point: Need to scan millions of accounts at midnight; manual calculation easily misses records.

Spring Batch solution: Partitioned reading, batch interest calculation, automatic retry on failure.

Actual case: Processing time reduced from 4 hours to 23 minutes.

Scenario 2: E‑commerce order archiving

// Traditional SQL example (performance problem)
DELETE FROM active_orders
WHERE create_time < '2023-01-01'
LIMIT 5000; // repeat until no rows left

Problem: Deleting millions of rows locks the table.

Correct approach: Use Spring Batch to page‑read, write to history table, then batch delete.

Scenario 3: Log analysis

Typical demand: Analyze API response time distribution from Nginx logs.

Special challenge: Memory control when processing GB‑level text files.

2. Traditional Pain Points

Complex resource management.

Thread‑pool misconfiguration leads to OOM or connection leaks.

Fault‑tolerance black holes.

// Typical multithreaded error example
ExecutorService executor = Executors.newFixedThreadPool(8);
try {
    while (hasNextPage()) {
        List<Data> page = fetchNextPage();
        executor.submit(() -> processPage(page)); // may cause memory leak
    }
} finally {
    executor.shutdown(); // forgetting this causes thread pile‑up
}

3. Spring Batch Core Architecture

3.1 Four Core Components

Component 1: Job

Core role: Define a complete batch pipeline (e.g., monthly report generation).

Real case: A bank's end‑of‑day reconciliation Job contains three Steps.

@Bean
public Job reconciliationJob() {
    return jobBuilderFactory.get("dailyReconciliation")
        .start(downloadBankFileStep())
        .next(validateDataStep())
        .next(generateReportStep())
        .build();
}

Component 2: Step

Implements the chunk‑oriented processing model.

@Bean
public Step csvProcessingStep() {
    return stepBuilderFactory.get("csvProcessing")
        .<User, User>chunk(500)
        .reader(csvReader())
        .processor(validationProcessor())
        .writer(dbWriter())
        .faultTolerant()
        .skipLimit(10)
        .skip(DataIntegrityViolationException.class)
        .build();
}

Component 3: ItemReader

@Bean
public FlatFileItemReader<User> csvReader() {
    return new FlatFileItemReaderBuilder<User>()
        .name("userReader")
        .resource(new FileSystemResource("data/users.csv"))
        .delimited().delimiter(",")
        .names("id", "name", "email")
        .fieldSetMapper(new BeanWrapperFieldSetMapper<User>() {{ setTargetType(User.class); }})
        .linesToSkip(1)
        .build();
}

Component 4: ItemWriter

@Bean
public JdbcBatchItemWriter<User> userWriter(DataSource ds) {
    return new JdbcBatchItemWriterBuilder<User>()
        .dataSource(ds)
        .sql("INSERT INTO users (name, age, email) VALUES (:name, :age, :email)")
        .beanMapped()
        .build();
}

4. Hands‑On Development Guide

4.1 Environment Setup (pom.xml excerpt)

<!-- Complete POM configuration -->
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>3.1.5</version>
</parent>
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-batch</artifactId>
    </dependency>
    <dependency>
        <groupId>com.h2database</groupId>
        <artifactId>h2</artifactId>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>
</dependencies>
# application.properties
spring.batch.jdbc.initialize-schema=always
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver

4.2 First Batch Job

@Configuration
@EnableBatchProcessing
public class BatchConfig {
    @Autowired private JobBuilderFactory jobBuilderFactory;
    @Autowired private StepBuilderFactory stepBuilderFactory;

    @Bean
    public Job importUserJob() {
        return jobBuilderFactory.get("importUserJob")
            .start(csvProcessingStep())
            .build();
    }

    @Bean
    public Step csvProcessingStep() {
        return stepBuilderFactory.get("csvProcessing")
            .<User, User>chunk(100)
            .reader(userReader())
            .processor(userProcessor())
            .writer(userWriter())
            .build();
    }

    @Bean
    public FlatFileItemReader<User> userReader() {
        return new FlatFileItemReaderBuilder<User>()
            .name("userReader")
            .resource(new ClassPathResource("users.csv"))
            .delimited().delimiter(",")
            .names("name", "age", "email")
            .fieldSetMapper(new BeanWrapperFieldSetMapper<User>() {{ setTargetType(User.class); }})
            .linesToSkip(1)
            .build();
    }

    @Bean
    public ItemProcessor<User, User> userProcessor() {
        return user -> {
            if (user.getAge() < 0) {
                throw new IllegalArgumentException("年龄不能为负数: " + user);
            }
            return user.toBuilder()
                .email(user.getEmail().toLowerCase())
                .build();
        };
    }

    @Bean
    public JdbcBatchItemWriter<User> userWriter(DataSource ds) {
        return new JdbcBatchItemWriterBuilder<User>()
            .dataSource(ds)
            .sql("INSERT INTO users (name, age, email) VALUES (:name, :age, :email)")
            .beanMapped()
            .build();
    }
}

5. Real‑World Case: Bank Transaction Reconciliation

5.1 Requirements

Dual data source: file from bank + internal database.

Millions of records need efficient matching.

Differences must be persisted quickly.

Run in a distributed environment.

5.2 Architecture Overview

5.3 Domain Model

@Data @AllArgsConstructor @NoArgsConstructor
public class Transaction {
    private String transactionId;
    private LocalDateTime tradeTime;
    private BigDecimal amount;
    private String bankSerialNo;
    private BigDecimal bankAmount;
    private String internalOrderNo;
    private BigDecimal systemAmount;
    private ReconStatus status;
    private String discrepancyType;
}

enum ReconStatus { MATCHED, AMOUNT_DIFF, STATUS_DIFF, ONLY_IN_BANK, ONLY_IN_SYSTEM }

5.4 Job Definition

@Configuration
@EnableBatchProcessing
public class BankReconJobConfig {
    @Bean
    public Job bankReconciliationJob(Step downloadStep, Step reconStep, Step reportStep) {
        return jobBuilderFactory.get("bankReconciliationJob")
            .start(downloadStep)
            .next(reconStep)
            .next(reportStep)
            .build();
    }
    // downloadStep, reconStep, reportStep definitions omitted for brevity – they follow the same pattern of reader → processor → writer with fault‑tolerance, skip and retry settings.
}

5.5 Core Processors

public class DataMatchingProcessor implements ItemProcessor<Transaction, Transaction> {
    @Override
    public Transaction process(Transaction item) {
        if (item.getBankSerialNo() == null) {
            item.setStatus(ReconStatus.ONLY_IN_SYSTEM);
        } else if (item.getInternalOrderNo() == null) {
            item.setStatus(ReconStatus.ONLY_IN_BANK);
        } else {
            compareAmounts(item);
            compareStatuses(item);
        }
        return item;
    }
    // compareAmounts and compareStatuses implementations omitted for brevity
}

public class DiscrepancyClassifier implements ItemProcessor<Transaction, Transaction> {
    @Override
    public Transaction process(Transaction item) {
        if (item.getStatus() != ReconStatus.MATCHED) {
            item.setAlertLevel(calculateAlertLevel(item));
        }
        return item;
    }
    // calculateAlertLevel implementation omitted for brevity
}

6. Production‑Grade Features

6.1 Fault Tolerance

@Bean
public Step secureStep() {
    return stepBuilderFactory.get("secureStep")
        .<Input, Output>chunk(500)
        .reader(jdbcReader())
        .processor(secureProcessor())
        .writer(restApiWriter())
        .faultTolerant()
        .retryLimit(3)
        .retry(ConnectException.class)
        .retry(DeadlockLoserDataAccessException.class)
        .skipLimit(100)
        .skip(DataIntegrityViolationException.class)
        .skip(InvalidDataAccessApiUsageException.class)
        .noRollback(ValidationException.class)
        .listener(new ErrorLogListener())
        .build();
}

6.2 Performance Optimisation

Chunk size tuned to memory (e.g., 2000).

HikariCP pool size increased.

JPA batch size set to 1000.

Parallel steps, partitioning and asynchronous processors demonstrated.

6.3 Monitoring & Management

Integrate Micrometer with Prometheus and expose custom metrics for processed records, step duration, and error counts.

@Bean
public MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
    return registry -> registry.config().commonTags("application", "batch-service");
}

public class BatchMetricsListener extends JobExecutionListenerSupport {
    private final Counter processedRecords = Counter.builder("batch.records.processed")
        .description("Total processed records")
        .register(Metrics.globalRegistry);
    @Override
    public void afterStep(StepExecution stepExecution) {
        processedRecords.increment(stepExecution.getWriteCount());
    }
}

7. FAQ & Troubleshooting

7.1 Restarting a Failed Job

// Query failed executions
SELECT * FROM BATCH_JOB_EXECUTION WHERE STATUS = 'FAILED';
// Relaunch with original parameters
JobParameters params = new JobParametersBuilder()
    .addLong("restartId", originalExecutionId)
    .toJobParameters();
jobLauncher.run(job, params);

7.2 Handling Power Failure

Spring Batch persists step execution context in BATCH_STEP_EXECUTION_CONTEXT, allowing a job to resume from the last committed chunk.

7.3 Passing Dynamic Parameters

// Command line
java -jar batch.jar --spring.batch.job.name=dataImportJob date=2023-10-01

// Programmatic launch
JobParameters params = new JobParametersBuilder()
    .addString("mode", "forceUpdate")
    .addLong("timestamp", System.currentTimeMillis())
    .toJobParameters();
jobLauncher.run(importJob, params);

7.4 Performance Checklist

Database indexes for batch‑read/write columns.

HikariCP max‑pool‑size tuned to workload.

JVM flags:

-XX:+UseStringDeduplication -XX:+UseCompressedOops -XX:MaxMetaspaceSize=512m -Xmx4g -XX:+UseG1GC -XX:MaxGCPauseMillis=200

Spring Batch chunk size adjusted to available memory.

— END —

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Monitoring Batch Processing Data Integration Spring Batch

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.