Operations 17 min read

Essential Linux & Java Debugging Tools Every Engineer Should Know

This guide compiles a comprehensive set of Linux commands and Java diagnostic utilities—including tail, grep, awk, find, tsar, btrace, Greys, jps, jstack, jmap, and more—providing practical examples and code snippets to help engineers quickly troubleshoot and monitor system and JVM issues.

Efficient Ops
Efficient Ops
Efficient Ops
Essential Linux & Java Debugging Tools Every Engineer Should Know

1. Introduction

In daily work we often encounter difficult problems; various tools play a crucial role in solving them. This note serves both as a personal reference and a shared resource for colleagues.

2. Linux Command Tools

2.1 tail

Commonly used tail -f for real‑time log monitoring.

tail -300f shopbase.log   # show last 300 lines and follow

2.2 grep

grep forest f.txt                     # search in a single file</code><code>grep forest f.txt cpf.txt            # search in multiple files</code><code>grep 'log' /home/admin -r -n          # recursive search</code><code>cat f.txt | grep -i shopbase          # case‑insensitive</code><code>grep 'shopbase' /home/admin -r -n --include *.{vm,java}   # include specific extensions</code><code>grep 'shopbase' /home/admin -r -n --exclude *.{vm,java}   # exclude specific extensions</code><code>seq 10 | grep 5 -A 3                # show 3 lines after match</code><code>seq 10 | grep 5 -B 3                # show 3 lines before match</code><code>seq 10 | grep 5 -C 3                # show 3 lines around match</code><code>cat f.txt | grep -c 'SHOPBASE'      # count occurrences

2.3 awk

Basic usage:

awk '{print $4,$6}' f.txt</code><code>awk '{print NR,$0}' f.txt cpf.txt</code><code>awk '{print FNR,$0}' f.txt cpf.txt</code><code>awk '{print FNR,FILENAME,$0}' f.txt cpf.txt</code><code>awk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txt</code><code>echo 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'

Pattern matching:

awk '/ldb/ {print}' f.txt                     # match ldb</code><code>awk '!/ldb/ {print}' f.txt                    # not match ldb</code><code>awk '/ldb/ && /LISTEN/ {print}' f.txt        # match both</code><code>awk '$5 ~ /ldb/ {print}' f.txt               # match column 5

Built‑in variables: NR: total record number (line count) across all input files. FNR: record number within the current file. NF: number of fields in the current record.

2.4 find

sudo -u admin find /home/admin /tmp /usr -name *.log          # search multiple directories</code><code>find . -iname *.txt                                         # case‑insensitive name</code><code>find . -type d                                            # list directories</code><code>find /usr -type l                                         # list symbolic links</code><code>find /usr -type l -name "z*" -ls                         # detailed info of links</code><code>find /home/admin -size +250000k                         # files larger than 250 MB</code><code>find /home/admin -perm 777 -exec ls -l {} ;               # find by permission</code><code>find /home/admin -atime -1                                 # accessed within 1 day</code><code>find /home/admin -ctime -1                                 # status changed within 1 day</code><code>find /home/admin -mtime -1                                 # modified within 1 day</code><code>find /home/admin -amin -1                                 # accessed within 1 minute</code><code>find /home/admin -cmin -1                                 # status changed within 1 minute</code><code>find /home/admin -mmin -1                                 # modified within 1 minute

2.5 pgm

Batch query logs from vm-shopbase:

pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'

2.6 tsar

Alibaba’s open‑source collection tool for historical and real‑time system metrics.

tsar               # view recent day metrics
tsar --live        # real‑time metrics (default 5‑second refresh)
tsar -d 20161218   # view data of a specific day (up to ~4 months)
tsar --mem</code><code>tsar --load</code><code>tsar --cpu   # can be combined with -d for daily view

2.7 top

Use top together with other commands to investigate JVM issues. ps -ef | grep java; top -H -p <pid> Convert thread IDs from decimal to hex and use jstack for detailed analysis.

2.8 Other useful commands

netstat -nat|awk '{print $6}'|sort|uniq -c|sort -rn   # show connection states

3. Troubleshooting Tools

3.1 btrace

btrace is a powerful production‑environment tracing tool.

@OnMethod(clazz = "java.util.ArrayList", method="add", location = @Location(value = Kind.CALL, clazz = "/.*/", method = "/.*/"))
public static void m(@ProbeClassName String probeClass, @ProbeMethodName String probeMethod, @TargetInstance Object instance, @TargetMethodOrField String method) {
    if (getInt(field("java.util.ArrayList", "size"), instance) > 479) {
        println("check who ArrayList.add method:" + probeClass + "#" + probeMethod + ", method:" + method + ", size:" + getInt(field("java.util.ArrayList", "size"), instance));
        jstack();
        println();
        println("===========================");
        println();
    }
}
</code><code>@OnMethod(clazz = "com.xxxxxxx.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl", method="nav", location = @Location(value = Kind.RETURN))
public static void mt(long userId, int current, int relation, String check, String redirectUrl, @Return AnyType result) {
    println("parameter# userId:" + userId + ", current:" + current + ", relation:" + relation + ", check:" + check + ", redirectUrl:" + redirectUrl + ", result:" + result);
}

More details at https://github.com/btraceio/btrace .

3.2 Greys

Features overlapping with btrace: sc -df xxx: show class details, source location, classloader. trace class method: display method execution time breakdown.

3.3 javOSize

Allows on‑the‑fly bytecode modification for quick logging.

3.4 JProfiler

Previously used for many issues; now Greys and btrace cover most cases.

https://www.ej-technologies.com/products/jprofiler/overview.html

4. Java “Seven Swords” (Key Tools)

4.1 jps

sudo -u admin /opt/xxxxx/java/bin/jps -mlvV

4.2 jstack

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815

4.3 jinfo

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815

4.4 jmap

Two main uses:

Inspect heap:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815

Dump heap:

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815

Combine with zprofiler and btrace for deeper analysis.

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10

4.5 jstat

sudo -u admin /opt/xxx/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000

4.6 jdb

Used for pre‑release debugging; attach to remote JVM on port 8000:

sudo -u admin /opt/taobao/java/bin/jdb -attach 8000.

4.7 CHLSDB

sudo -u admin /opt/taobao/java/bin/java -classpath /opt/XXXXX/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB

More details at http://rednaxelafx.iteye.com/blog/1847971 .

5. Other Useful Utilities

5.1 dmesg

When a Java process disappears without trace, dmesg can reveal OOM‑killer actions. sudo dmesg | grep -i kill | less Search for keywords such as oom_killer:

[6710782.021013] java invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
[6710782.070639] ? oom_kill_process+0x68/0x140
[6710782.257588] Task in /LXC011175068174 killed as a result of limit of /LXC011175068174
[6710784.698347] Memory cgroup out of memory: Kill process 215701 (java) score 854 or sacrifice child
[6710784.707978] Killed process 215701, UID 679, (java) total-vm:11017300kB, anon-rss:7152432kB, file-rss:1232kB
OOM killer monitors memory usage; when the system is about to run out, it selects the highest‑scoring process and terminates it to protect the machine.

Convert dmesg timestamps to real time:

date -d "1970-01-01 UTC $(date +%s)-$(cat /proc/uptime|cut -f 1 -d' ')+12288812.926194|bc seconds"

6. New Skill – RateLimiter

To finely control QPS (e.g., limit to 400 requests per second), use Guava’s RateLimiter. See details at http://ifeve.com/guava-ratelimite .

DebuggingPerformanceoperationsLinuxshell
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.