perfomance checklist
January 24, 2017
🔗Perfomance checklist for SRE's
🔗Linux Perf Analysis in 60s
- uptime
load averages - dmesg -T | tail
kernel errors - vmstat 1
overall stats by time - mpstat -P ALL 1
CPU balance - pidstat 1
process usage - iostat -xz 1
disk I/O - free -m
memory usage - sar -n DEV 1
network I/O - sar -n TCP,ETCP 1
TCP stats - top
check overview
🔗Linux Disk Checklist
- iostat -xz 1
any disk I/O? if not, stop looking - vmstat 1
is this swapping? or, high sys time? - df -h
are file systems nearly full? - ext4slower 10
(zfs*, xfs*, etc.) slow file system I/O? - bioslower 10
if so, check disks - ext4dist 1
check distribution and rate - biolatency 1
if interesting, check disks - cat /sys/devices/…/ioerr_cnt
(if available) errors - smartctl -l error /dev/sda1
(if available) errors
🔗Linux Network Checklist
- sar -n DEV,EDEV 1
at interface limits? or use nicstat - sar -n TCP,ETCP 1
active/passive load, retransmit rate - cat /etc/resolv.conf
it's always DNS - mpstat -P ALL 1
high kernel time? single hot CPU? - tcpretrans
what are the retransmits? state? - tcpconnect
connecting to anything unexpected? - tcpaccept
unexpected workload? - netstat -rnv
any inefficient routes? - check firewall config
anything blocking/throttling? - netstat -s
play 252 metric pickup
🔗Linux CPU Checklist
- uptime
load averages - vmstat 1
system-wide utilization, run q length - mpstat -P ALL 1
CPU balance - pidstat 1
per-process CPU - CPU flame graph
CPU profiling - CPU subsecond offset heat map
look for gaps - perf stat -a -- sleep 10
IPC, LLC hit ratio
https://nbari.com/post/observability-tools/
{{< youtube zxCWXNigDpA>}}
Thanks to Brendan Gregg's for all this info http://www.brendangregg.com/ http://www.brendangregg.com/blog/2016-05-04/srecon2016-perf-checklists-for-sres.html
The Realities of the Job of Delivering Reliability
{{< youtube Lf4RwlOdppg>}}