Linux Crisis Tools
Brendan Gregg posted the following list of 'crisis tools' which you should install on your Linux servers by default (so they are available when an incident happens).
Package | Provides | Notes |
---|---|---|
procps | ps(1), vmstat(8), uptime(1), top(1) | basic stats |
util-linux | dmesg(1), lsblk(1), lscpu(1) | system log, device info |
sysstat | iostat(1), mpstat(1), pidstat(1), sar(1) | device stats |
iproute2 | ip(8), ss(8), nstat(8), tc(8) | preferred net tools |
numactl | numastat(8) | NUMA stats |
tcpdump | tcpdump(8) | Network sniffer |
linux-tools-common linux-tools-$(uname -r) | perf(1), turbostat(8) | profiler and PMU stats |
bpfcc-tools (bcc) | opensnoop(8), execsnoop(8), runqlat(8), softirqs(8), hardirqs(8), ext4slower(8), ext4dist(8), biotop(8), biosnoop(8), biolatency(8), tcptop(8), tcplife(8), trace(8), argdist(8), funccount(8), profile(8), etc. | canned eBPF tools[1] |
bpftrace | bpftrace, basic versions of opensnoop(8), execsnoop(8), runqlat(8), biosnoop(8), etc. | eBPF scripting[1] |
trace-cmd | trace-cmd(1) | Ftrace CLI |
nicstat | nicstat(1) | net device stats |
ethtool | ethtool(8) | net device info |
tiptop | tiptop(1) | PMU/PMC top |
cpuid | cpuid(1) | CPU details |
msr-tools | rdmsr(8), wrmsr(8) | CPU digging |