Big Data 10 min read

Handling Large Data Queries in MySQL with MyBatis: Regular, Stream, and Cursor Approaches

The article explains how to efficiently retrieve and process massive MySQL result sets in Java using MyBatis, comparing regular pagination, streaming queries via Cursor, and cursor-based fetchSize techniques, and provides practical code examples and best‑practice tips to avoid OOM and improve performance.

Architect's Guide
Architect's Guide
Architect's Guide
Handling Large Data Queries in MySQL with MyBatis: Regular, Stream, and Cursor Approaches

Background

When dealing with operations that involve a large amount of data—such as data migration, data export, or batch processing—loading the entire result set into memory can cause OOM (Out‑Of‑Memory) errors and slow down queries because the framework spends a lot of time wrapping rows into entity objects.

Example scenario

Suppose a business system needs to read 1,000,000 rows from a MySQL table for processing. The usual approaches are:

Regular query: Load all 1,000,000 rows into JVM memory at once, or use pagination.

Stream query: Open a long‑lived connection and use a server‑side cursor to fetch one row at a time.

Cursor query: Control the number of rows fetched per round via the fetchSize parameter (multiple rows per fetch).

Regular query

By default, the full result set is stored in memory. This is efficient for small tables but impractical for a table with 1,000,000 rows. A typical MyBatis‑Plus mapper method looks like:

@Mapper
public interface BigDataSearchMapper extends BaseMapper<BigDataSearchEntity> {
    @Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
    Page<BigDataSearchEntity> pageList(@Param("page") Page<BigDataSearchEntity> page,
                                      @Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper);
}
Note: The example uses MyBatis‑Plus.

This simple approach will likely cause the database server to become a bottleneck or make the query take minutes, hours, or even days.

Stream query

A stream query returns an iterator instead of a full collection, allowing the application to fetch one row at a time, which dramatically reduces memory consumption.

In MyBatis, streaming is achieved via the org.apache.ibatis.cursor.Cursor interface, which extends java.io.Closeable and java.lang.Iterable . The cursor provides three useful methods:

isOpen(): Checks whether the cursor is still open before fetching data.

isConsumed(): Determines if all rows have been read.

getCurrentIndex(): Returns the number of rows already retrieved.

When using a stream, the connection remains open, so the application must close it after processing the data.

Why use a stream query?

If a query returns a huge result set that cannot fit into client memory, a stream query allows processing without loading everything at once. Even in sharding scenarios where a query spans multiple databases, streaming (except for GROUP BY and ORDER BY ) helps keep memory usage low.

Cursor query

Cursor queries also avoid OOM by fetching rows in batches using the fetchSize setting. MyBatis supports this with two annotation‑based approaches:

@Mapper
public interface BigDataSearchMapper extends BaseMapper<BigDataSearchEntity> {
    // Method 1: fetch many rows per round
    @Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
    @Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 1000000)
    Page<BigDataSearchEntity> pageList(@Param("page") Page<BigDataSearchEntity> page,
                                      @Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper);

    // Method 2: fetch one row per round
    @Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
    @Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 100000)
    @ResultType(BigDataSearchEntity.class)
    void listData(@Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper,
                  ResultHandler<BigDataSearchEntity> handler);
}

The @Options annotation can set ResultSetType.FORWARD_ONLY (cursor can only move forward) and fetchSize (batch size). The @ResultType annotation specifies the entity class for the result.

Note: The method returning void must accept a ResultHandler to process each row; otherwise the framework cannot close the cursor automatically.

In practice, the two methods differ:

Method 1 returns multiple rows per fetch.

Method 2 returns a single row per fetch.

Oracle fetches a batch of rows to the client, while MySQL returns rows one‑by‑one via ResultSet.next() , which can block if the network buffer fills.

Non‑stream vs. stream query differences

Non‑stream query: Memory usage grows linearly with the number of rows.

Stream query: Memory stays stable; its size depends on the configured batch size ( BATCH_SIZE ).

Remember to clear temporary containers (e.g., gxids.clear() ) after processing each batch.

MySQLMyBatisCursorlarge-dataStreaming QueryFetchSize
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.