How to Efficiently Split and Merge Large Log Files on Linux
When log files grow massive, traditional tools like vim, cat, grep, and awk become slow and memory‑hungry, but Linux’s split command lets you divide a huge file by line count or size, process the pieces individually, and later recombine them, dramatically improving analysis efficiency.
In daily work, analyzing large log files with tools like vim, cat, grep, awk becomes a nightmare due to slow speed, high memory consumption, difficulty reusing filtered output, and cumbersome file transfer.
Slow execution because the file must be loaded into memory, causing many disk reads.
Excessive resource usage; a 4 GB log needs at least 4 GB RAM, larger files need more.
Hard to reuse filtered content; pipelines on large files are inefficient.
File transfer is difficult; full‑size transfer consumes bandwidth.
Big‑data offline frameworks such as Hadoop can handle these scenarios, but they require lengthy computation and custom MapReduce jobs, adding complexity. Hadoop splits large files into many small ones for parallel processing. Linux provides a simple split utility to achieve the same.
The split command supports two ways to divide a file:
By line count using the -l option.
By size using the -b option.
2.1 Split by line count
Example: split a 3.4 GB log file into pieces of 50 000 lines each, naming the pieces split-line with numeric suffixes.
# source file size
[root@VM_3_50_centos split]# ls -l 2020011702-www.happylauliu.cn.gz -h
-rw-r--r-- 1 root root 3.4G 1月 17 09:42 2020011702-www.happylauliu.cn.gz
# split by lines
[root@~]# split -l 50000 -d --verbose 2020011702-www.happylauliu.cn.gz split-line
Creating file "split-line00"
Creating file "split-line01"
...
Creating file "split-line9171"
# verify line count
[root@VM_3_50_centos split]# wc -l split-line00
50000 split-line00
...
[root@VM_3_50_centos split]# wc -l split-line9171
1020 split-line9171
# check file size
[root@VM_3_50_centos split]# ls -lh split-line0[0-9]
-rw-r--r-- 1 root root 14M 1月 17 16:54 split-line00
...Each piece is about 14 MB, making subsequent analysis easier, though the number of files increases.
2.2 Split by size
Alternatively, split by size using -b. The following example splits the same file into 500 MB chunks.
# split by size
[root@~]# split -b 500M -d --verbose 2020011702-www.happylauliu.cn.gz split-size
Creating file "split-size00"
Creating file "split-size01"
...
# list resulting files
[root@VM_3_50_centos split]# ls -lh split-size0*
-rw-r--r-- 1 root root 500M 1月 17 17:03 split-size00
-rw-r--r-- 1 root root 500M 1月 17 17:03 split-size01
...
-rw-r--r-- 1 root root 444M 1月 17 17:04 split-size062.3 Merge multiple files
To recombine split parts, use redirection:
# merge two parts
[root@VM_3_50_centos split]# cat split-size01 split-size02 >two-file-merge
# check merged size
[root@VM_3_50_centos split]# ls -lh two-file-merge
-rw-r--r-- 1 root root 1000M 1月 17 17:20 two-file-mergeWhile merging large files still incurs performance costs, it can be done as needed.
Source: https://cloud.tencent.com/developer/article/1576576
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
