Operations 8 min read

How to Efficiently Split and Merge Massive Log Files on Linux

Processing huge log files with traditional tools like vim or cat can be painfully slow and memory‑intensive, but Linux’s split command lets you divide massive files by line count or size, making analysis, transfer, and merging far more manageable, as demonstrated with practical examples and code snippets.

Open Source Linux
Open Source Linux
Open Source Linux
How to Efficiently Split and Merge Massive Log Files on Linux

Analyzing large log files with tools such as vim, cat, grep, or awk becomes a nightmare when the files are huge, because these tools load the entire file into memory, consume excessive resources, are hard to reuse, and are difficult to transfer.

Slow execution speed due to loading the whole file into memory.

High resource consumption: a 4 GB log requires at least 4 GB of RAM.

Hard to reuse output when piping large files.

Large files are costly to transfer over the network.

1. The pain of large files

Big‑data offline frameworks like Hadoop can handle such scenarios, but they require long processing times and custom MapReduce jobs. Linux provides a simple split utility that can cut a large file into many smaller ones.

Split offers two ways to cut files:

By line count using the -l option.

By size using the -b option.

2.1 Split by line count

Example: split a 3.4 GB log file into pieces of 50 000 lines each, naming the pieces split-line with numeric suffixes.

# source file size
ls -lh happylauliu.cn.gz
-rw-r--r-- 1 root root 3.4G 1月 17 09:42 happylauliu.cn.gz

# split by line count
split -l 50000 -d --verbose happylauliu.cn.gz split-line
Creating file "split-line00"
Creating file "split-line01"
... (continues up to split-line9171)

# verify line count of a piece
wc -l split-line00   # 50000
wc -l split-line9171 # 1020

# check size of pieces
ls -lh split-line0[0-9]
-rw-r--r-- 1 root root 14M ... split-line00

After splitting, each file is about 14 MB, making further analysis much easier.

2.2 Split by size

The -b option allows splitting by byte size (K, M, G, …). The following command splits the same log into 500 MB chunks.

# split by size
split -b 500M -d --verbose happylauliu.cn.gz split-size
Creating file "split-size00"
Creating file "split-size01"
Creating file "split-size02"
Creating file "split-size03"
Creating file "split-size04"
Creating file "split-size05"
Creating file "split-size06"

# list resulting files
ls -lh split-size0*
-rw-r--r-- 1 root root 500M ... split-size00
-rw-r--r-- 1 root root 500M ... split-size01
-rw-r--r-- 1 root root 500M ... split-size02
-rw-r--r-- 1 root root 500M ... split-size03
-rw-r--r-- 1 root root 500M ... split-size04
-rw-r--r-- 1 root root 500M ... split-size05
-rw-r--r-- 1 root root 444M ... split-size06

2.3 Merge multiple files

To recombine split parts, use standard output redirection.

# merge two split files
cat split-size01 split-size02 > two-file-merge
ls -lh two-file-merge
-rw-r--r-- 1 root root 1000M two-file-merge

While merging large files still incurs performance costs, it can be useful when needed.

Source: https://cloud.tencent.com/developer/article/1576576
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxsplit commandFile Splitting
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.