PHP Implementation of Simple Multi‑way Merge Sort for Large Files

This article explains how to implement a simple multi‑way merge sort in PHP to sort a massive 1 GB file using only 100 MB of memory, detailing the file‑splitting, incremental merging steps, and providing a complete PHP code demo.

php Courses
php Courses
php Courses
PHP Implementation of Simple Multi‑way Merge Sort for Large Files

Sorting extremely large files with limited memory is a common interview and engineering challenge. This guide shows how to sort a 1 GB file containing unordered numbers while only using about 100 MB of RAM by applying a multi‑way merge sort algorithm written in PHP.

The process consists of three main stages: (1) read the large file line by line, group every 10,000 lines, sort each group, and write the sorted chunks to temporary files (e.g., t1.txt, t2.txt, …); (2) open all temporary files, read the first line from each, place those values into a temporary array, repeatedly extract the smallest value, write it to the final result file, and replace the extracted value with the next line from the same file; (3) continue this merge loop until all temporary files are exhausted, producing a fully sorted output.

The following PHP code demonstrates a minimal skeleton of this multi‑way merge approach. In a production setting the "min" selection could be optimized with a fixed‑size min‑heap or a priority queue that also tracks the originating file.

function multiWaySort()
{
    // Read all file descriptors
    $fds = [];
    $dir = scandir('./runtime/txt/');
    foreach ($dir as $file) {
        if ($file != '.' && $file != '..') {
            $fds[] = fopen('./runtime/txt/' . $file, 'rb');
        }
    }

    // Read the first line of each file into a temporary sorting array
    $tmpNums = [];
    foreach ($fds as $fd) {
        $tmpNums[] = (int)fgets($fd);
    }

    $nums = [];
    $count = 0;
    while (true) {
        if (empty($tmpNums)) break;

        // Put the smallest value into the temporary storage array
        $value = min($tmpNums);
        $nums[] = $value;

        // Read the next line from the file that provided the smallest value
        $idx = array_search($value, $tmpNums);
        $next = fgets($fds[$idx]);
        if (!$next) {
            unset($tmpNums[$idx]);
            unset($fds[$idx]);
        } else {
            $tmpNums[$idx] = (int)$next;
        }

        // When the temporary storage array reaches a certain size, flush it to the result file
        if (count($nums) == 20) {
            foreach ($nums as $value) {
                $f = fopen('./runtime/result.txt', 'ab+');
                fwrite($f, $value . PHP_EOL);
            }
            $nums = [];
        }

        if ($count == 4999999) {
            continue;
        }
        $count++;
    }
}

By adapting the min‑value extraction to a more efficient data structure such as a binary heap or SPLPriorityQueue, the algorithm can handle even larger datasets while keeping memory usage low.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendPHPmerge sortlarge-file
php Courses
Written by

php Courses

php中文网's platform for the latest courses and technical articles, helping PHP learners advance quickly.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.