Managing Memory for Large‑Scale Data Migration with Spring Batch Readers
This article explains how to avoid out‑of‑memory errors during massive data migrations by choosing appropriate Spring Batch readers, comparing JdbcCursorItemReader and JdbcPagingItemReader, and showing code examples and memory‑usage diagrams.
Overview
This blog records a critical issue encountered when using Spring Batch for data migration: how to ensure memory stability when the migration volume is large. When using Spring Batch, three components must be configured: a reader, a processor, and a writer.
What Is the Problem?
The problem is that the reader stage can consume excessive memory when the amount of data to be read is huge. For small data sets (hundreds of thousands of rows) the memory impact is minimal, but when the data reaches millions or tens of millions of rows, loading all records into memory at once can exceed the JVM heap.
For example, if the JVM has only 4 GB of heap and the database contains 8 GB of data, a single‑shot read will inevitably fail.
Spring‑Provided Reader Implementations
JdbcCursorItemReader
JdbcCursorItemReader reads all rows in one go. Example code:
@Bean
public JdbcCursorItemReader<CustomerCredit> itemReader() {
return new JdbcCursorItemReaderBuilder<CustomerCredit>()
.dataSource(this.dataSource)
.name("creditReader")
.sql("select ID, NAME, CREDIT from CUSTOMER")
.rowMapper(new CustomerCreditRowMapper())
.build();
}This simplicity comes at a cost: the generated SQL returns the entire result set, causing the JVM to allocate a huge object that ends up in the old generation, leading to the error:
Resource exhaustion event: The JVM was unable to allocate memory from the heap.
Heap memory grows with each batch read because the whole result set is retained until the job finishes.
Therefore, JdbcCursorItemReader should not be used for large data volumes.
JdbcPagingItemReader
JdbcPagingItemReader reads data page by page, allowing you to control the page size and keep memory usage low. Example code:
@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
Map
parameterValues = new HashMap<>();
parameterValues.put("status", "NEW");
return new JdbcPagingItemReaderBuilder<CustomerCredit>()
.name("creditReader")
.dataSource(dataSource)
.queryProvider(queryProvider)
.parameterValues(parameterValues)
.rowMapper(customerCreditMapper())
.pageSize(1000)
.build();
}
@Bean
public SqlPagingQueryProviderFactoryBean queryProvider() {
SqlPagingQueryProviderFactoryBean provider = new SqlPagingQueryProviderFactoryBean();
provider.setSelectClause("select id, name, credit");
provider.setFromClause("from customer");
provider.setWhereClause("where status=:status");
provider.setSortKey("id");
return provider;
}By setting a page size, each read returns only a small subset of rows, which are allocated in the young generation and reclaimed quickly by minor GCs, keeping the old generation stable.
When using JdbcPagingItemReader, a sort key must be specified (and should be a unique key) to guarantee consistent pagination without data loss.
Conclusion
For small data volumes the choice of reader makes little difference, but for large migrations a paging reader such as JdbcPagingItemReader is essential for good memory performance, despite the additional configuration overhead.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.