Databases 10 min read

Efficient MySQL Queries for Millions of Rows: Regular, Stream, and Cursor

When processing massive MySQL result sets, loading all rows into JVM memory can cause OOM and slow performance, so this guide compares three approaches—regular pagination, streaming queries using server-side cursors, and cursor‑based fetchSize control—detailing their implementations, MyBatis configurations, and trade‑offs.

Architect
Architect
Architect
Efficient MySQL Queries for Millions of Rows: Regular, Stream, and Cursor

Overview

When a MySQL query returns millions of rows, loading the entire result set into JVM memory can cause Out‑Of‑Memory (OOM) errors and severe performance degradation. MyBatis offers three approaches to handle large result sets while keeping memory usage low:

Regular pagination (LIMIT/OFFSET)

Streaming query using org.apache.ibatis.cursor.Cursor Cursor‑based query with configurable

fetchSize

1. Regular Pagination

Pagination retrieves a subset of rows per request, preventing the whole table from being loaded at once. It is simple to implement but suffers from performance issues when the OFFSET becomes large because MySQL still scans the preceding rows.

@Mapper
public interface BigDataSearchMapper extends BaseMapper<BigDataSearchEntity> {
    @Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
    Page<BigDataSearchEntity> pageList(@Param("page") Page<BigDataSearchEntity> page,
                                      @Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper);
}

Use this method when the required page size is modest and deep pagination is not needed. For very deep pages consider alternative strategies.

2. Streaming Query

A streaming query returns a Cursor that implements java.io.Closeable and java.lang.Iterable. The application iterates over the cursor, fetching one row at a time, while the JDBC connection remains open.

Close the cursor (or the underlying connection) after processing to avoid leaks.

All rows must be consumed or the result set closed before issuing another statement on the same connection.

Key Cursor methods:

isOpen() – checks whether the cursor is still open.

isConsumed() – true when all rows have been read.

getCurrentIndex() – number of rows already fetched.

Streaming is especially useful in sharding scenarios where results from many tables need to be merged without exhausting client memory.

3. Cursor Query with fetchSize

MyBatis can configure the JDBC driver to fetch a configurable batch of rows per round‑trip using the @Options annotation. The driver keeps the connection open, but only a limited number of rows are transferred at a time.

@Mapper
public interface BigDataSearchMapper extends BaseMapper<BigDataSearchEntity> {
    // Strategy 1 – fetch many rows per batch (e.g., 1,000,000)
    @Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
    @Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 1000000)
    Page<BigDataSearchEntity> pageList(@Param("page") Page<BigDataSearchEntity> page,
                                      @Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper);

    // Strategy 2 – fetch a smaller batch (e.g., 100,000) and process each row via ResultHandler
    @Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
    @Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 100000)
    @ResultType(BigDataSearchEntity.class)
    void listData(@Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper,
                  ResultHandler<BigDataSearchEntity> handler);
}

Important @Options settings: resultSetType = ResultSetType.FORWARD_ONLY – cursor can only move forward, which is the most efficient for streaming. fetchSize – number of rows retrieved per network round‑trip. Larger values reduce round‑trips but increase memory usage.

The method that uses a ResultHandler must return void because the handler processes each row as it arrives.

Comparison

Non‑streaming (full list or pagination with large OFFSET) : memory grows linearly with the number of rows; query time can become minutes or hours.

Streaming / Cursor with fetchSize : memory stays roughly constant, bounded by the configured batch size (e.g., fetchSize or a custom BATCH_SIZE variable). After each batch, clear temporary collections (e.g., gxids.clear()) to release memory.

Large data processing diagram
Large data processing diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

StreamingmysqlMyBatisCursorLarge DataDatabase Query
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.