Why Does a .tar.gz File Have Two Extensions? The History Behind the Dual Suffix
The article explains that the .tar.gz suffix reflects two separate Unix tools—tar for archiving and gzip for compression—whose combined use via a pipeline dates back to the early 1980s and embodies the Unix philosophy of single‑purpose programs.
Origin of tar
tar first appeared in January 1979 as part of Unix Version 7 at AT&T Bell Labs. Its name, Tape ARchive, reflects its original purpose: concatenate a collection of files into a single byte stream that can be written sequentially to magnetic tape.
tar does not perform compression; it merely packs files. A 100 MB directory archived with tar remains 100 MB. The internal block size of tar is 512 bytes, matching the disk sector size of the V7 file system, a design choice that persists today.
Origin of gzip
In 1992 Jean‑loup Gailly and Mark Adler released gzip 0.1 (31 Oct 1992) as a free replacement for the patented compress utility. gzip implements the DEFLATE algorithm and compresses a raw byte stream; it does not understand directory structures.
Combining tar and gzip
Because tar does not compress and gzip does not archive, the two tools are chained with a Unix pipe: tar cf - mydir | gzip > mydir.tar.gz Explanation: tar cf - mydir creates a stream of the directory contents and writes it to standard output ("-").
The pipe symbol | passes that stream to gzip. gzip compresses the incoming stream and writes the result to mydir.tar.gz via redirection.
The double suffix ".tar.gz" records the two processing steps: first tar, then gzip.
GNU tar’s -z option
GNU tar added the -z flag so that tar can invoke gzip automatically, enabling the common tar -xzvf command. The flag does not give tar native compression; it merely delegates the compression step to gzip.
Unix philosophy
Doug McIlroy’s 1978 principle—"write programs to do one thing and do it well, let them cooperate via text streams"—underlies this design. The pipe ( |) is the concrete mechanism for that cooperation.
Write programs to do one thing, and do it well. Write programs to cooperate with each other. Prefer text streams as the universal interface.
Contrast with Windows/DOS archiving
In the DOS/Windows world, tools such as PKZIP (1989, Phil Katz) introduced the .zip format, which combines archiving and compression in a single executable. Later formats like RAR (1993, Eugene Roshal) and 7‑Zip’s .7z (1999, Igor Pavlov) followed the same “all‑in‑one” approach, replacing older tools when new compression algorithms became available.
Unix retained the separation of concerns: later compressors such as bzip2 (1996, Julian Seward) and xz (2009, Lasse Collin) are used with tar via the same pipeline, producing extensions like .tar.bz2 and .tar.xz. The philosophy of “two tools, one pipe” remains unchanged.
Historical quote
McIlroy’s 1978 Bell System Technical Journal article states three core ideas, which are reproduced in the quote above. McIlroy also invented the pipe symbol ( |) that enables the tar‑gzip pipeline.
Why the double suffix persists
Modern open‑source releases (Linux kernel, nginx, curl, redis, etc.) continue to be distributed as .tar.gz because the Unix approach of separate archiving and compression remains effective and widely supported.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
