Databases 12 min read

Optimizing Distributed Database Sorting: In-Memory Merge to Disk Buffering

This article examines the challenges of sorting queries in distributed databases, outlines the limitations of in‑memory proxy sorting, and presents a step‑by‑step optimized approach that leverages per‑shard sorting, disk‑based buffering, and priority‑queue merging to reduce memory pressure and I/O overhead.

ITPUB
ITPUB
ITPUB
Optimizing Distributed Database Sorting: In-Memory Merge to Disk Buffering

Background

Typical distributed database architecture consists of a global transaction manager (gtm) providing a global clock, a metadata service (catalog), horizontally sharded data groups, and a stateless proxy that routes client requests to the appropriate shards and coordinates distributed transactions.

Distributed database architecture diagram
Distributed database architecture diagram

gtm : global transaction manager (global clock), primary‑backup.

catalog : metadata management, primary‑backup.

group : horizontal shard, each group has a primary and replicas.

proxy : stateless coordinator that forwards queries, aggregates results, and ensures transactional consistency.

Sorting Problem

A query such as SELECT * FROM t1 ORDER BY field1 may need rows from many shards. The proxy must merge the ordered streams from each shard into a globally ordered result. Keeping all rows in memory works only for small result sets; large result sets cause out‑of‑memory (OOM) or excessive disk usage.

Solution Overview

1. In‑Memory Merge Sorting

Each shard sorts locally before sending data to the proxy, so the proxy receives already ordered streams. The proxy merges these streams using a heap‑based k‑way merge.

In‑memory sorting flow diagram
In‑memory sorting flow diagram

Client sends SELECT * FROM t1 ORDER BY id to the proxy.

Proxy forwards the query to the relevant shard groups.

Each shard sorts locally and streams the ordered rows back.

Proxy stores each shard’s stream in a per‑shard sort buffer (default 10 MB, split proportionally when multiple shards are involved).

Proxy merges the buffers with a priority‑queue (heap) and returns the globally ordered result.

2. Disk‑Based Sorting for Large Result Sets

If the total result exceeds available memory, the proxy writes each shard’s ordered stream to a temporary disk file and performs a heap‑based merge directly from disk.

Disk‑based sorting flow diagram
Disk‑based sorting flow diagram

Client issues SELECT * FROM t1 ORDER BY id.

Proxy forwards the query to each shard.

Shards sort locally and stream ordered rows.

Proxy writes each shard’s stream to a separate disk file.

Proxy builds a heap where each node holds the next row from a file; the heap size equals the number of shards.

When the heap top is emitted, the proxy reads the next row from the same file and pushes it into the heap.

Process continues until all rows are sent.

3. Final Optimized Scheme (Batch Pull & Merge)

The optimized design avoids persisting any shard data on the proxy’s disk. The proxy repeatedly pulls a fixed‑size batch of ordered rows from each shard, fills per‑shard sort buffers, and merges them with a priority queue. This loop continues until the client’s requested row count is satisfied, minimizing both memory and I/O.

Final scheme diagram
Final scheme diagram

Client sends SELECT * FROM t1 ORDER BY id to the proxy.

Proxy forwards the query to the relevant shard groups.

Each shard returns a fixed‑size ordered batch.

Proxy stores the batch in the shard’s sort buffer.

Proxy merges buffers using a heap: the smallest row is sent to the client, and the corresponding buffer is refilled from its shard when depleted.

The loop repeats until the client’s limit (e.g., 1 million rows) is reached.

Analysis of the Final Scheme

The design addresses three limitations of the previous disk‑based approach:

Disk capacity : No shard data is persisted on the proxy’s disk, eliminating storage exhaustion.

Disk I/O : Data stays in memory buffers; only small batches are read from shards, virtually removing disk I/O.

Network bandwidth : The proxy pulls only as many rows as needed to fill the buffers, avoiding the transfer of the full result set from every shard.

Assuming a per‑shard sort buffer of 2 MB and 50 shards, the worst‑case data transferred is 2 MB × 50 + 1 million rows, while the best case approaches 1 million + 50 rows when data is evenly distributed.

Usage Constraints

Shard nodes must support local sorting (e.g., MySQL, PostgreSQL).

Shards must allow batch reads, such as streaming or cursor queries.

For databases that retain query context across execution, batch reads can be even more efficient.

References

JDBC操作MySQL(3)—查询

MySQL JDBC StreamResult通信原理浅析

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingpriority-queuedisk bufferingproxy mergesorting optimization
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.