11 minute read

A getting started guide for vmstat, iostat, mpstat and pidstat.

vmstat

Once I figured out how vmstat works, I found that I almost could no longer live without it, it’s that good. Fortunately it’s available everywhere, so I don’t. There are many versions around, both for Linux and Unix.

vmstat does a little bit of everything: memory, processes, I/O, swap, disks, CPU.

The basic syntax is:

vmstat [options] <interval> <count>

vmstat prints a new update every interval seconds, stopping after count lines. If no count is supplied, vmstat continues until killed. If there isn’t any interval, only 1 line is printed.

# 1 update per second, 5 updates
-> vmstat 1 5
procs   -----------memory----------   --swap-    ---io--- -system-- ------cpu-----
 r  b   swpd    free   buff  cache    si   so    bi    bo   in   cs us sy id wa st
 4  0      0 4098084 375068 3462944    0    0    34    25  340  320  8  3 89  0  0
 1  0      0 4098184 375068 3463264    0    0     0     0 1263 3198  4  3 93  0  0
 0  0      0 4098436 375068 3462924    0    0     0     0 1141 2999  3  2 95  0  0
 0  0      0 4097736 375076 3462916    0    0     0     0 2166 4414  5  4 92  0  0
 0  0      0 4096476 375076 3462916    0    0     0     0 1103 3130  3  3 94  0  0

The values in the very first line are averages since boot, thus it usually stands out as different than the rest.

All values, with the exception of memory and processes, are average rates from the previous update. Memory and processes (the first 6 columns) are always instantaneous values. Memory is shown in KBs but -S changes units, e.g. -S M for MBs.

If your version of vmstat supports -w, use it, makes the table wider and easier to read.

So what does each column mean?

column meaning unit
r number of processes running -
b number of processes sleeping -
spwd swap memory used K
free amount of idle memory K
buff memory used for buffers K
cache memory used for cache K
si swaps per second from disk to memory 1/s
so swaps per second from memory to disk 1/s
bi blocks per second received from block device blocks/s
bo blocks per second sent to block device blocks/s
in interrupts per second 1/s
cs context switches per second 1/s
us CPU user time % of total
sy CPU system time % of total
id CPU idle time % of total
st CPU time stolen from Virtual Machine % of total

A few guidelines to read the values:

  • check first si and so for swapping activity, this is the performance killer
  • check that memory allocation makes sense for the workload
  • consistent high us indicate possible CPU bound load
  • consistent high wa indicate possible I/O bound load
  • if this is a VM and st (steal) is consistently high for long, the host system could be overloaded or the VM under-provisioned

-d gets us per-device I/O counters since boot. With -p <PART> we get the same for an specific parition.

-> vmstat -d 1
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
sda   154968   4762 5471678   52614 123981  67290 4139296   68873      0     45
sda   154968   4762 5471678   52614 123985  67305 4139440   68876      0     45
sda   187412   5295 5735630   57902 126252  67704 5888256   80177      0     51
...

Columns are read like this:

column meaning
total total number of reads / writes
merged merged requests (N requests into one I/O)
sectors number of sectors read / written
ms milliseconds spent reading / writing
cur number of I/O operations running
s seconds spent for I/O

We can also access a detailed cache utilization report with -m, but it may require root permissions to work.

> vmstat -m
Cache                            Num  Total   Size  Pages
ecryptfs_inode_cache            1544   1564    960     17
ecryptfs_file_cache              256   1024     16    256
ecryptfs_auth_tok_list_item       19     76    832     19
nf_conntrack_expect                0      0    216     18
nf_conntrack                     101    200    320     25
fuse_request                      80     80    400     20
fuse_inode                        42     76    832     19
ext4_groupinfo_4k               1716   1736    144     28
ext4_inode_cache              325004 327150   1080     30
ext4_allocation_context          128    128    128     32
ext4_io_end                      256    320     64     64
ext4_extent_status            156920 156978     40    102
mbcache                          294    438     56     73
...

And the columns are:

column meaning
cache the cache name
num active object count
total total object count
size size per object
pages pages with one or more active object

vmstat also has these one-shot views:

  • -s system statistics
  • -D all disks summary
  • -f forks

iostat

iostat shows the combined CPU and disk I/O statistics. As with vmstat, it can work with interval and count

# one update per second
-> iostat 1
Linux 4.19.8-arch1-1-ARCH (ix) 	12/24/2018 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.14    0.03    5.66    0.03    0.00   82.14

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               4.41        29.11        49.44    3409859    5790640

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.56    0.00    3.79    0.00    0.00   90.66

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.00         0.00         0.00          0          0
...

Again, the first report shows averages since boot (we can omit it with -y). While the following shows rates since the previous interval.

We can choose to only see cpu with -c or disk with -d. Also -x shows extended I/O statistics and -m changes units to MB. -h changes into an easier to read (for humans) format.

mpstat

mpstat shows (more) detailed CPU statistics. We get vmstat’s counters plus some. Same deal with the interval and count

# 1 update per second
-> mpstat 1
Linux 4.19.8-arch1-1-ARCH (ix) 	12/24/2018 	_x86_64_	(4 CPU)

06:48:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:48:42 PM  all    7.85    0.00    3.04    0.00    0.25    0.51    0.00    0.00    0.00   88.35
06:48:43 PM  all   10.08    0.00    4.79    0.00    1.01    0.00    0.00    0.00    0.00   84.13
06:48:44 PM  all    7.25    0.00    5.25    0.00    0.75    0.50    0.00    0.00    0.00   86.25
06:48:45 PM  all    9.07    0.00    3.78    0.25    1.01    0.50    0.00    0.00    0.00   85.39

First line is, yet again, averages since boot.

We can choose to only show activity for some CPUs with -P, e.g. -P 1,3

There are additional views: -I ALL shows all CPU interruptions, -n shows CPU statistics based on NUMA placement nodes (use -N to indicate which nodes to report). There is even a JSON output with -o JSON

mpstat shows general CPU utilization, to track an individual process there is pidstat

pidstat

This one has some similarities with top. Every interval pidstat prints what processes were running. We can also track particular processes with -p. Threads are shown with -t.

# track process CPU activity, thread view
-> pidstat -t -p 25809 1
Linux 4.19.8-arch1-1-ARCH (ix) 	12/29/2018 	_x86_64_	(4 CPU)

11:00:58 PM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
11:00:58 PM   980     25809         -    0.22    0.07    0.00    0.00    0.28     1  mysqld
11:00:58 PM   980         -     25809    0.00    0.00    0.00    0.00    0.00     1  |__mysqld
11:00:58 PM   980         -     25811    0.00    0.00    0.00    0.00    0.00     0  |__mysqld
11:00:58 PM   980         -     25812    0.00    0.00    0.00    0.00    0.00     3  |__mysqld
11:00:58 PM   980         -     25820    0.00    0.00    0.00    0.00    0.00     1  |__mysqld
11:00:58 PM   980         -     25834    0.02    0.01    0.00    0.01    0.02     3  |__mysqld
11:00:58 PM   980         -     25835    0.02    0.00    0.00    0.01    0.02     3  |__mysqld
11:00:58 PM   980         -     25836    0.01    0.00    0.00    0.01    0.02     2  |__mysqld
11:00:58 PM   980         -     25837    0.01    0.00    0.00    0.01    0.01     0  |__mysqld

-r shows memory per process, which gives a quick way to check for memory leaks

# track process memory activity
-> pidstat -r -p 27766 1
Linux 4.19.8-arch1-1-ARCH (ix) 	12/30/2018 	_x86_64_	(4 CPU)

09:42:59 PM   UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
09:43:00 PM  1000     27766      0.00      0.00  689680   17128   0.15  mysqld
09:43:01 PM  1000     27766      0.00      0.00  689680   17128   0.15  mysqld
09:43:02 PM  1000     27766      0.00      0.00  689680   17128   0.15  mysqld
09:43:03 PM  1000     27766      0.00      0.00  689680   17128   0.15  mysqld
Average:     1000     27766      0.00      0.00  689680   17128   0.15  mysqld

pidstat can start a program with -e and show its activity during the execution.

Besides CPU, pidstat can also show per-process I/O activity

# per-process I/O
-> pidstat -d 1 5
Linux 4.20.0-arch1-1-ARCH (ix) 	01/06/2019 	_x86_64_	(4 CPU)

04:04:53 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
04:04:54 PM  1000     14423    276.00 197888.00      0.00       1  dd
04:04:55 PM  1000     14423      0.00 218880.00      0.00       0  dd
04:04:56 PM  1000     14423      0.00 224664.00      0.00       0  dd
04:04:57 PM  1000     14423      0.00 230428.00      0.00       0  dd
04:04:59 PM  1000     14423      0.00 192520.00      0.00       0  dd
Average:     1000     14423     46.00 207763.33      0.00       0  dd

If pidstat prints negative values, it means it doesn’t have enough permissions to access the process. Try switching to the user running the process (or as root, that should always).

pidstat supports all these views:

  • -d I/O statistics
  • -u CPU statistics
  • -R process priority
  • -r memory utilization
  • -s stack utilization
  • -t thread statistics
  • -w task switching
  • -v file descriptor

Putting It All Together

These programs complement each other. How would all work together? Let’s say we want to find why my machine is running slow.

The first thing we can check is vmstat:

-> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0 12 773184 3782540 161528 3176304    0    0  1184 38944 2888 9877 20  5  1 74  0
 2  9 773184 3745748 161528 3213272    0    0  2660 36868 2876 7426 18  7  1 73  0
 0 11 773184 3630836 161540 3326616    0    0  2688 43052 2959 7064 23  5  1 70  0
 2 11 771648 2297092 161792 4873416    0    0  2672 40960 3034 7547 23  6  2 69  0
 1 11 771648 2255008 161796 4915000    0    0  2088 38912 2939 7370 24  6  2 67  0
 0 11 771648 2142616 161796 4992940    0    0  2148 38916 2976 8061 31  6  1 63  0
 1 10 771648 2100532 161804 5034252   20    0  1608 41012 3919 9665 34 11  1 54  0
 0 11 819776 149836 157540 6949188    0    0  1920 12292 2702 5527  6  4  0 90  0

What can you see? A consistent high number of sleeping (b), memory utilization going up, no swapping… that’s good, disks are being written.

Hmm, a consistently high CPU wait (wa). There are indications of an I/O bottleneck, the CPUs are waiting for disks requests.

What does mpstat say?

mpstat 1
Linux 4.20.0-arch1-1-ARCH (ix) 	01/06/2019 	_x86_64_	(4 CPU)

01:34:49 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:34:50 PM  all   24.54    0.00    4.22   68.87    1.06    1.06    0.00    0.00    0.00    0.26
01:34:51 PM  all   19.05    0.00    3.44   74.87    1.06    0.53    0.00    0.00    0.00    1.06
01:34:52 PM  all   19.37    0.00    3.14   74.61    0.79    1.05    0.00    0.00    0.00    1.05
01:34:54 PM  all   22.08    0.00    3.12   62.86    0.52    1.04    0.00    0.00    0.00   10.39

Again there it is, a high iowait. There is something funny going on with the disks.

Let’s try iostat:

iostat -x 1 -y
Linux 4.20.0-arch1-1-ARCH (ix) 	01/06/2019 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00    0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             54.00    0.00   2532.00      0.00     0.00     0.00   0.00   0.00    0.57    0.00   0.02    46.89     0.00   0.06   0.30
dm-0            54.00    0.00   2532.00      0.00     0.00     0.00   0.00   0.00    0.50    0.00   0.03    46.89     0.00   0.24   1.30
sdd              1.00  347.00      4.00  39512.00     0.00     0.00   0.00   0.00   53.00    5.42   0.99     4.00   113.87   2.86  99.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          23.90    0.00    6.49   68.31    0.00    1.30

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             69.00    0.00   2636.00      0.00     2.00     0.00   2.82   0.00    0.58    0.00   0.06    38.20     0.00   0.29   2.00
dm-0            71.00    0.00   2636.00      0.00     0.00     0.00   0.00   0.00    1.03    0.00   0.07    37.13     0.00   0.32   2.30
sdd              0.00  340.00      0.00  38672.00     0.00     0.00   0.00   0.00    0.00    5.84   0.99     0.00   113.74   2.92  99.30

Quite a lot of writes on the sdd device. Even worse, the %util is reaching 100%, the device is saturated (%util is only meaningful for spinning disks, RAID or SDD have different capacities)

So the problem seems to be related with sdd.

Can we check what processes are doing I/O?

sudo pidstat -d 1
Linux 4.20.0-arch1-1-ARCH (ix) 	01/06/2019 	_x86_64_	(4 CPU)

01:36:25 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
01:36:26 PM     0      2546      3.81      0.00      0.00     100  kworker/u8:3+flush-8:48
01:36:26 PM     0      2799      0.00     38.10      0.00      48  jbd2/sdd1-8
01:36:26 PM   980      3435   1474.29  36323.81      0.00       0  mysqld

01:36:26 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
01:36:27 PM     0      2546      0.00      0.00      0.00      99  kworker/u8:3+flush-8:48
01:36:27 PM   980      3435   2524.00  35096.00      0.00       0  mysqld

We find a mysql server, a kernel thread flushing data and jbd2 (filesystem journal).

Aha! we found what has been causing all that troublesome I/O. To continue investigating we would need to get into the database, but this post is getting way too long already. If you are curious, the root cause was some bad queries that were creating temporary tables.

What’s Next?

How about some long term analysis? Something that can work unattended and do reports? Sounds nice?

Good, I guess next article will be about sar.

So long.

Tomas