Code Ape Tech Column
Mar 15, 2021 · Big Data
How to Find Common URLs in 5 Billion-Entry Files with Only 4 GB RAM
Given two files each containing 5 billion 64‑byte URLs (≈320 GB total) and only 4 GB of memory, the solution partitions the URLs by hash modulo 1000 into 1,000 smaller files, then uses hash sets to identify the intersecting URLs efficiently.
Big DataMemory Optimizationhash partition
0 likes · 3 min read
