Efficient MySQL Queries for Millions of Rows: Regular, Stream, and Cursor
When processing massive MySQL result sets, loading all rows into JVM memory can cause OOM and slow performance, so this guide compares three approaches—regular pagination, streaming queries using server-side cursors, and cursor‑based fetchSize control—detailing their implementations, MyBatis configurations, and trade‑offs.
Overview
When a MySQL query returns millions of rows, loading the entire result set into JVM memory can cause Out‑Of‑Memory (OOM) errors and severe performance degradation. MyBatis offers three approaches to handle large result sets while keeping memory usage low:
Regular pagination (LIMIT/OFFSET)
Streaming query using org.apache.ibatis.cursor.Cursor Cursor‑based query with configurable
fetchSize1. Regular Pagination
Pagination retrieves a subset of rows per request, preventing the whole table from being loaded at once. It is simple to implement but suffers from performance issues when the OFFSET becomes large because MySQL still scans the preceding rows.
@Mapper
public interface BigDataSearchMapper extends BaseMapper<BigDataSearchEntity> {
@Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
Page<BigDataSearchEntity> pageList(@Param("page") Page<BigDataSearchEntity> page,
@Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper);
}Use this method when the required page size is modest and deep pagination is not needed. For very deep pages consider alternative strategies.
2. Streaming Query
A streaming query returns a Cursor that implements java.io.Closeable and java.lang.Iterable. The application iterates over the cursor, fetching one row at a time, while the JDBC connection remains open.
Close the cursor (or the underlying connection) after processing to avoid leaks.
All rows must be consumed or the result set closed before issuing another statement on the same connection.
Key Cursor methods:
isOpen() – checks whether the cursor is still open.
isConsumed() – true when all rows have been read.
getCurrentIndex() – number of rows already fetched.
Streaming is especially useful in sharding scenarios where results from many tables need to be merged without exhausting client memory.
3. Cursor Query with fetchSize
MyBatis can configure the JDBC driver to fetch a configurable batch of rows per round‑trip using the @Options annotation. The driver keeps the connection open, but only a limited number of rows are transferred at a time.
@Mapper
public interface BigDataSearchMapper extends BaseMapper<BigDataSearchEntity> {
// Strategy 1 – fetch many rows per batch (e.g., 1,000,000)
@Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
@Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 1000000)
Page<BigDataSearchEntity> pageList(@Param("page") Page<BigDataSearchEntity> page,
@Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper);
// Strategy 2 – fetch a smaller batch (e.g., 100,000) and process each row via ResultHandler
@Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment}")
@Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 100000)
@ResultType(BigDataSearchEntity.class)
void listData(@Param(Constants.WRAPPER) QueryWrapper<BigDataSearchEntity> queryWrapper,
ResultHandler<BigDataSearchEntity> handler);
}Important @Options settings: resultSetType = ResultSetType.FORWARD_ONLY – cursor can only move forward, which is the most efficient for streaming. fetchSize – number of rows retrieved per network round‑trip. Larger values reduce round‑trips but increase memory usage.
The method that uses a ResultHandler must return void because the handler processes each row as it arrives.
Comparison
Non‑streaming (full list or pagination with large OFFSET) : memory grows linearly with the number of rows; query time can become minutes or hours.
Streaming / Cursor with fetchSize : memory stays roughly constant, bounded by the configured batch size (e.g., fetchSize or a custom BATCH_SIZE variable). After each batch, clear temporary collections (e.g., gxids.clear()) to release memory.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
