Databases 12 min read

Optimizing Distributed Database Sorting: From In‑Memory Merge to Disk‑Based Priority Queues

This article examines the challenges of ordering results in distributed databases and presents a step‑by‑step optimization that moves sorting from proxy memory to disk‑based priority‑queue processing, reducing memory pressure, disk I/O, and network waste.

ITPUB
ITPUB
ITPUB
Optimizing Distributed Database Sorting: From In‑Memory Merge to Disk‑Based Priority Queues

Background

Typical distributed database deployments consist of a stateless proxy (coordinator), multiple horizontal shards (group), a metadata service (catalog) and a global transaction manager (gtm). The proxy receives client SQL, forwards it to the relevant shards, and merges the shard responses. When a query contains ORDER BY, each shard returns its rows already sorted according to the same ordering key, and the proxy must produce a globally ordered result set.

Components

gtm : global transaction manager, primary‑backup.

catalog : metadata store, primary‑backup.

group : set of shard nodes, each with primary‑backup storage.

proxy : stateless router that aggregates shard results and enforces distributed transaction consistency.

Sorting challenge

For SELECT * FROM t1 ORDER BY field1 the rows are partitioned across N shards. The proxy must merge N already‑sorted streams. If the total result fits in memory the merge can be performed in‑memory; otherwise the proxy risks out‑of‑memory (OOM) or excessive disk I/O.

Initial in‑memory merge sort

Procedure

Client sends the ordered query to the proxy.

Proxy forwards the query to each shard.

Each shard sorts locally and streams the full ordered result.

Proxy allocates a sort buffer (default 10 MB) and divides it equally among the N shards (≈10 MB/N per shard).

Proxy stores each shard’s rows in its slice of the buffer and merges the streams using a priority‑queue (k‑way merge).

Proxy returns the merged rows to the client.

Limitations

The approach scales only while the total result size is smaller than the allocated buffer. Increasing the buffer consumes more proxy RAM, which cannot be expanded indefinitely.

Disk‑based extension

To support larger result sets the proxy can spill each shard’s sorted output to temporary files and perform a disk‑based k‑way merge.

Optimized disk‑based sorting

Workflow

Client issues the ordered query.

Proxy forwards it to the relevant shards.

Each shard sorts locally and streams a fixed‑size chunk (e.g., 2 MB) of ordered rows.

Proxy writes each chunk to a per‑shard temporary file.

Proxy builds a min‑heap where each node holds the next row from a shard’s buffer.

When the heap top is emitted, the proxy reads the next row from the same shard’s file (refilling the in‑memory buffer as needed) and pushes it back into the heap.

Steps 5–6 repeat until all rows are produced.

Drawbacks

Even with chunking, the proxy may need to store N × chunk‑size data on disk, which can be prohibitive for very large N.

Heavy disk I/O is introduced.

Example: with 50 shards and a client LIMIT 1 M, the proxy would temporarily hold 50 M rows on disk, most of which are unnecessary.

Batch‑pull refinement

Instead of pulling the entire sorted stream, the proxy repeatedly fetches a small batch from each shard, merges the batches, returns the merged rows, and then requests the next batch. This limits the amount of data kept on disk and reduces I/O.

Final end‑to‑end solution

Algorithm

Client sends SELECT * FROM t1 ORDER BY id [LIMIT L] to the proxy.

Proxy forwards the query to all shards.

Each shard sorts locally and streams a fixed‑size ordered batch (e.g., 2 MB) to the proxy.

Proxy stores each batch in a per‑shard in‑memory sort buffer.

Proxy builds a priority‑queue heap from the first row of each buffer.

Emit the heap top to the client, then replace it with the next row from the same buffer (reading from the shard’s file if the buffer is exhausted).

When a buffer is empty, the proxy pulls the next batch from the corresponding shard and refills the buffer.

Continue until the client’s limit is satisfied or all rows are consumed.

Analysis

Memory usage : Only one batch per shard resides in memory (e.g., 2 MB × N). No full shard data is ever persisted on the proxy’s disk.

Disk I/O : Disk writes occur only for the current batch when a shard’s buffer overflows; the I/O volume is bounded by the batch size.

Network efficiency : The proxy requests rows only as needed to satisfy the client’s limit, avoiding the “pull‑all‑then‑discard” pattern of naïve approaches.

Assuming 2 MB buffers and 50 shards, the worst‑case data transferred from shards to the proxy is 2 MB × 50 + L × rowSize. In the best case (uniform distribution) the proxy may pull roughly L + N rows.

Usage constraints

Shard nodes must support ordered queries (e.g., MySQL, PostgreSQL).

Shards must expose a streaming or cursor interface that allows fetching rows in fixed‑size batches.

References

JDBC操作MySQL(3)—查询

MySQL JDBC StreamResult通信原理浅析

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory Managementdistributed databaseproxy mergesorting optimizationbatch fetchingdisk-based priority queue
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.