perfomance checklist
January 24, 2017
🔗Perfomance checklist for SRE's
🔗Linux Perf Analysis in 60s
- uptime
load averages
- dmesg -T | tail
kernel errors
- vmstat 1
overall stats by time
- mpstat -P ALL 1
CPU balance
- pidstat 1
process usage
- iostat -xz 1
disk I/O
- free -m
memory usage
- sar -n DEV 1
network I/O
- sar -n TCP,ETCP 1
TCP stats
- top
check overview
🔗Linux Disk Checklist
- iostat -xz 1
any disk I/O? if not, stop looking
- vmstat 1
is this swapping? or, high sys time?
- df -h
are file systems nearly full?
- ext4slower 10
(zfs*, xfs*, etc.) slow file system I/O?
- bioslower 10
if so, check disks
- ext4dist 1
check distribution and rate
- biolatency 1
if interesting, check disks
- cat /sys/devices/…/ioerr_cnt
(if available) errors
- smartctl -l error /dev/sda1
(if available) errors
🔗Linux Network Checklist
- sar -n DEV,EDEV 1
at interface limits? or use nicstat
- sar -n TCP,ETCP 1
active/passive load, retransmit rate
- cat /etc/resolv.conf
it's always DNS
- mpstat -P ALL 1
high kernel time? single hot CPU?
- tcpretrans
what are the retransmits? state?
- tcpconnect
connecting to anything unexpected?
- tcpaccept
unexpected workload?
- netstat -rnv
any inefficient routes?
- check firewall config
anything blocking/throttling?
- netstat -s
play 252 metric pickup
🔗Linux CPU Checklist
- uptime
load averages
- vmstat 1
system-wide utilization, run q length
- mpstat -P ALL 1
CPU balance
- pidstat 1
per-process CPU
- CPU flame graph
CPU profiling
- CPU subsecond offset heat map
look for gaps
- perf stat -a -- sleep 10
IPC, LLC hit ratio
https://nbari.com/post/observability-tools/
{{< youtube zxCWXNigDpA>}}
Thanks to Brendan Gregg's for all this info http://www.brendangregg.com/ http://www.brendangregg.com/blog/2016-05-04/srecon2016-perf-checklists-for-sres.html
The Realities of the Job of Delivering Reliability
{{< youtube Lf4RwlOdppg>}}