Backend Development 7 min read

Managing Memory for Large‑Scale Data Migration with Spring Batch Readers

This article explains how to avoid out‑of‑memory errors during massive data migrations by choosing appropriate Spring Batch readers, comparing JdbcCursorItemReader and JdbcPagingItemReader, and showing code examples and memory‑usage diagrams.

Sohu Tech Products

Sep 1, 2021

Managing Memory for Large‑Scale Data Migration with Spring Batch Readers

Overview

This blog records a critical issue encountered when using Spring Batch for data migration: how to ensure memory stability when the migration volume is large. When using Spring Batch, three components must be configured: a reader, a processor, and a writer.

What Is the Problem?

The problem is that the reader stage can consume excessive memory when the amount of data to be read is huge. For small data sets (hundreds of thousands of rows) the memory impact is minimal, but when the data reaches millions or tens of millions of rows, loading all records into memory at once can exceed the JVM heap.

For example, if the JVM has only 4 GB of heap and the database contains 8 GB of data, a single‑shot read will inevitably fail.

Spring‑Provided Reader Implementations

JdbcCursorItemReader

JdbcCursorItemReader reads all rows in one go. Example code:

@Bean
public JdbcCursorItemReader<CustomerCredit> itemReader() {
    return new JdbcCursorItemReaderBuilder<CustomerCredit>()
            .dataSource(this.dataSource)
            .name("creditReader")
            .sql("select ID, NAME, CREDIT from CUSTOMER")
            .rowMapper(new CustomerCreditRowMapper())
            .build();
}

This simplicity comes at a cost: the generated SQL returns the entire result set, causing the JVM to allocate a huge object that ends up in the old generation, leading to the error:

Resource exhaustion event: The JVM was unable to allocate memory from the heap.

Heap memory grows with each batch read because the whole result set is retained until the job finishes.

Therefore, JdbcCursorItemReader should not be used for large data volumes.

JdbcPagingItemReader

JdbcPagingItemReader reads data page by page, allowing you to control the page size and keep memory usage low. Example code:

@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
    Map<String, Object> parameterValues = new HashMap<>();
    parameterValues.put("status", "NEW");
    return new JdbcPagingItemReaderBuilder<CustomerCredit>()
            .name("creditReader")
            .dataSource(dataSource)
            .queryProvider(queryProvider)
            .parameterValues(parameterValues)
            .rowMapper(customerCreditMapper())
            .pageSize(1000)
            .build();
}

@Bean
public SqlPagingQueryProviderFactoryBean queryProvider() {
    SqlPagingQueryProviderFactoryBean provider = new SqlPagingQueryProviderFactoryBean();
    provider.setSelectClause("select id, name, credit");
    provider.setFromClause("from customer");
    provider.setWhereClause("where status=:status");
    provider.setSortKey("id");
    return provider;
}

By setting a page size, each read returns only a small subset of rows, which are allocated in the young generation and reclaimed quickly by minor GCs, keeping the old generation stable.

When using JdbcPagingItemReader, a sort key must be specified (and should be a unique key) to guarantee consistent pagination without data loss.

Conclusion

For small data volumes the choice of reader makes little difference, but for large migrations a paging reader such as JdbcPagingItemReader is essential for good memory performance, despite the additional configuration overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Memory Management backend-development JdbcCursorItemReader JdbcPagingItemReader Spring Batch

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.