Fundamentals 9 min read

Why Use Lossy Compression? Algorithms, RLE, Huffman, and Compression Bombs

This article explains the principles of data compression, contrasting lossy and lossless methods, introduces common algorithms such as Run‑Length Encoding, dictionary coding, and Huffman coding, and discusses extreme cases like compression bombs, illustrating concepts with examples and visual diagrams.

ELab Team

Apr 30, 2021

Why Use Lossy Compression? Algorithms, RLE, Huffman, and Compression Bombs

1 Lossy Compression

Lossy compression exploits the human eye and ear's insensitivity to certain frequency components, allowing some information loss to achieve much higher compression ratios; it is widely used for audio, image, and video data.

After lossy compression, some data is permanently lost and cannot be fully restored, but the trade‑off is acceptable because the lost information rarely affects usability, providing a better cost‑performance ratio.

Common examples include everyday emoji images, where the loss of detail yields smaller, more shareable files.

2 Lossless Compression

Lossless compression removes statistical redundancy, allowing the original data to be perfectly reconstructed; typical compression ratios range from 2:1 to 5:1 and are used for text, program binaries, and specialized images such as fingerprints or medical scans.

Modern lossless algorithms can shrink files to 30‑40% of their original size, though higher compression often means slower decompression.

Examples of lossless techniques include Run‑Length Encoding (RLE), dictionary algorithms, and Huffman coding.

RLE compresses repeated characters by storing the count followed by the character, e.g. bbbbbb y y ttttt e ddd aaa n n ccccccc eee becomes 6b3y5t1e3d3a2n7c4e. Its drawback is that data without long runs can expand.

Dictionary algorithms replace frequently occurring words with short codes, similar to assigning nicknames.

Huffman coding, invented by David Huffman in 1952, assigns shorter bit patterns to more frequent symbols. For the sequence 1,50,20,50,50,18,50,25,32,18, the frequencies produce the following codes:

50: 00

18: 01

1: 100

20: 101

25: 110

32: 111

Applying these codes transforms the original data into 100,00,101,00,00,01,00,110,111,01, which is far more compact than the original binary representation.

3 Compression Bombs

A compression bomb is a tiny archive (often only tens of kilobytes) that expands to an enormous size—potentially petabytes—when decompressed, overwhelming system resources.

Examples include a 42 KB file that expands to 4.5 PB after repeated nested archives, and a 28 KB file that recursively extracts copies of itself, effectively creating an infinite loop.

These bombs exploit the fact that compression removes redundancy; however, because the data is deliberately repetitive, the information entropy is low, allowing extreme size reduction.

Historically, compression bombs have been used to evade antivirus scanning by forcing the scanner to decompress massive amounts of data, though modern security software can detect and mitigate such attacks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data compression Huffman coding lossless compression compression algorithms lossy compression

Written by

ELab Team

Sharing fresh technical insights

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.