How to Efficiently Split and Merge Massive Log Files on Linux
Processing huge log files with traditional tools like vim or cat can be painfully slow and memory‑intensive, but Linux’s split command lets you divide massive files by line count or size, making analysis, transfer, and merging far more manageable, as demonstrated with practical examples and code snippets.
Analyzing large log files with tools such as vim, cat, grep, or awk becomes a nightmare when the files are huge, because these tools load the entire file into memory, consume excessive resources, are hard to reuse, and are difficult to transfer.
Slow execution speed due to loading the whole file into memory.
High resource consumption: a 4 GB log requires at least 4 GB of RAM.
Hard to reuse output when piping large files.
Large files are costly to transfer over the network.
1. The pain of large files
Big‑data offline frameworks like Hadoop can handle such scenarios, but they require long processing times and custom MapReduce jobs. Linux provides a simple split utility that can cut a large file into many smaller ones.
Split offers two ways to cut files:
By line count using the -l option.
By size using the -b option.
2.1 Split by line count
Example: split a 3.4 GB log file into pieces of 50 000 lines each, naming the pieces split-line with numeric suffixes.
# source file size
ls -lh happylauliu.cn.gz
-rw-r--r-- 1 root root 3.4G 1月 17 09:42 happylauliu.cn.gz
# split by line count
split -l 50000 -d --verbose happylauliu.cn.gz split-line
Creating file "split-line00"
Creating file "split-line01"
... (continues up to split-line9171)
# verify line count of a piece
wc -l split-line00 # 50000
wc -l split-line9171 # 1020
# check size of pieces
ls -lh split-line0[0-9]
-rw-r--r-- 1 root root 14M ... split-line00After splitting, each file is about 14 MB, making further analysis much easier.
2.2 Split by size
The -b option allows splitting by byte size (K, M, G, …). The following command splits the same log into 500 MB chunks.
# split by size
split -b 500M -d --verbose happylauliu.cn.gz split-size
Creating file "split-size00"
Creating file "split-size01"
Creating file "split-size02"
Creating file "split-size03"
Creating file "split-size04"
Creating file "split-size05"
Creating file "split-size06"
# list resulting files
ls -lh split-size0*
-rw-r--r-- 1 root root 500M ... split-size00
-rw-r--r-- 1 root root 500M ... split-size01
-rw-r--r-- 1 root root 500M ... split-size02
-rw-r--r-- 1 root root 500M ... split-size03
-rw-r--r-- 1 root root 500M ... split-size04
-rw-r--r-- 1 root root 500M ... split-size05
-rw-r--r-- 1 root root 444M ... split-size062.3 Merge multiple files
To recombine split parts, use standard output redirection.
# merge two split files
cat split-size01 split-size02 > two-file-merge
ls -lh two-file-merge
-rw-r--r-- 1 root root 1000M two-file-mergeWhile merging large files still incurs performance costs, it can be useful when needed.
Source: https://cloud.tencent.com/developer/article/1576576
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
