Tagged articles

external sort

7 articles · Page 1 of 1

Sep 22, 2025 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Learn four practical techniques—simple sorting, hashmap deduplication, external merge sort, and bitmap bit‑set optimization—to efficiently remove duplicate QQ numbers from a 40‑billion‑record file while staying within a strict 1 GB memory limit, even handling tighter 100 MB constraints.

Big DataDeduplicationalgorithm

0 likes · 9 min read

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Big Data Technology & Architecture

Dec 24, 2020 · Big Data

Common Techniques for Processing Massive Data Sets

This article summarizes a range of practical methods—including Bloom filters, hashing, bit‑maps, heaps, bucket partitioning, database indexes, inverted indexes, external sorting, trie trees, and MapReduce—that are commonly used to handle, deduplicate, and query extremely large data volumes in big‑data applications.

Big DataHashingexternal sort

0 likes · 11 min read

Common Techniques for Processing Massive Data Sets

Architect's Tech Stack

Oct 23, 2020 · Big Data

Sorting a 4.6 GB File with 500 Million Integers: Internal, Bitmap, and External Sorting Techniques

The article explains how to sort a massive 4.6 GB file containing 500 million random integers by first attempting in‑memory quicksort and merge sort, then using a bitmap approach, and finally applying an external sort that splits the data into manageable chunks and merges them efficiently.

Big Dataalgorithmbitmap

0 likes · 8 min read

Sorting a 4.6 GB File with 500 Million Integers: Internal, Bitmap, and External Sorting Techniques

Top Architect

Feb 25, 2020 · Big Data

External Sorting of a 4.6 GB File Containing 500 Million Integers: Strategies, Implementations, and Performance

The article presents a practical case of sorting a 4.6 GB file with 500 million random integers, evaluates in‑memory quicksort and merge‑sort implementations, discusses bitmap sorting, and finally details a multi‑phase external‑sort algorithm with measured runtimes and resource considerations.

Sorting Algorithmbitmap sortexternal sort

0 likes · 11 min read

External Sorting of a 4.6 GB File Containing 500 Million Integers: Strategies, Implementations, and Performance

Architecture Digest

Feb 12, 2020 · Big Data

External Sorting of a 4.6 GB File with 500 Million Integers: Strategies and Implementation

This article explains how to sort a 4.6 GB file containing 500 million random integers using internal quicksort and merge sort attempts, the Unix sort command, a bitmap-based method, and a detailed external sorting strategy with multi‑way merge, discussing performance and resource constraints.

bitmap sortexternal sortmerge sort

0 likes · 10 min read

External Sorting of a 4.6 GB File with 500 Million Integers: Strategies and Implementation

Java Backend Technology

Feb 3, 2020 · Big Data

Sorting a 4.6GB File of 500M Numbers: Internal, Merge, Bitmap & External Techniques

This article explores how to sort a massive 4.6 GB file containing 500 million random integers by applying internal quicksort with median‑of‑three, merge sort, a bitmap‑based approach, and an external‑merge strategy, comparing their performance, memory usage, and implementation details in Java.

algorithmexternal sortsorting

0 likes · 10 min read

Sorting a 4.6GB File of 500M Numbers: Internal, Merge, Bitmap & External Techniques

Big Data Technology & Architecture

Jun 26, 2019 · Big Data

Common Techniques for Processing Massive Data Sets

This article summarizes a variety of practical methods—including Bloom filters, hashing, bit‑maps, heaps, bucket partitioning, database indexes, inverted indexes, external sorting, tries, and MapReduce—that can be used to efficiently handle and analyze extremely large data volumes in real‑world scenarios.

Data StructuresHashingexternal sort

0 likes · 15 min read

external sort

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Common Techniques for Processing Massive Data Sets

Sorting a 4.6 GB File with 500 Million Integers: Internal, Bitmap, and External Sorting Techniques

External Sorting of a 4.6 GB File Containing 500 Million Integers: Strategies, Implementations, and Performance

External Sorting of a 4.6 GB File with 500 Million Integers: Strategies and Implementation

Sorting a 4.6GB File of 500M Numbers: Internal, Merge, Bitmap & External Techniques

Common Techniques for Processing Massive Data Sets

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Sorting a 4.6 GB File with 500 Million Integers: Internal, Bitmap, and External Sorting Techniques

External Sorting of a 4.6 GB File Containing 500 Million Integers: Strategies, Implementations, and Performance

External Sorting of a 4.6 GB File with 500 Million Integers: Strategies and Implementation