...making Linux just a little more fun!
By Ron Peterson
Good systems administrators log stuff. Lots of stuff. A lot of the information we collect consists of time series data: a set of numerical values assocated with a sequence of discrete time values.
There are any number of tools to help the diligent sysadmin monitor this data visually as it is collected. A good many of them are built using Tobias Oetiker's excellent RRDTool. Some noteworthy examples include Cacti, Cricket, and Smokeping. There are many others.
That's all well and good as long as you know what you want to monitor. However, sometimes you'd just like to do some quick ad hoc visualization. As you might surmise, most Linux systems provide a myriad of visualization tools (Grace and GRI come to mind). In this article, I'll introduce you to Gnuplot, focusing specifically on how to plot time series data.
Gnuplot without data is like gravy without potatoes. So before we get to the gravy, let's make some potatoes. Let's say for the sake of argument, or at least for the purpose of giving the rest of the article something to talk about, I include the following line in my system's crontab file:
*/1 * * * * root /bin/cat /proc/loadavg 2>&1 | /usr/bin/logger -p local3.info -t CRON-LOADAVG
If you're like me, and have configured your system's syslog.conf as follows:
...then you will find all local3 facility messages in their own special file. Because we're telling 'logger' to tag all of our load average data, it will be easy to extract this information from the rest of our logfile clutter. A simple 'grep CRON-LOADAVG /var/log/cron.log > load.dat.1' should do nicely. This will give us a file that looks like so:
Mar 19 00:30:02 ahost CRON-LOADAVG: 0.40 0.78 1.19 11/296 3690 Mar 19 00:31:01 ahost CRON-LOADAVG: 3.54 1.55 1.41 4/311 3997 Mar 19 00:32:01 ahost CRON-LOADAVG: 2.68 1.59 1.43 2/278 4142 ...
Now let's extract just the data we want:
cat load.dat.1 | tr -s ' ' ' ' | cut -d' ' -f1,2,3,6 > load.dat.2
The translate command 'tr' squishes multiple spaces into a single space, so that we can expect more consistent behaviour from the 'cut' command. In this case, the translate command 'tr' is superfluous, but I think it's a good habit nonetheless. With any luck, our data now looks something like:
Mar 19 00:30:02 0.40 Mar 19 00:31:01 3.54 Mar 19 00:32:01 2.68 ...
That's almost perfect. Unfortunately, our gnuplot example will expect two space delimited columns of input, so we need to replace the spaces delimiting our timestamp components with some other character, like a hyphen.
perl -pe 's/(.*?)\s(.*?)\s(.*)/$1-$2-$3/;' load.dat.2 > load.dat.3
This isn't a Perl article, so I won't bore you with the details of what this command is doing. In the interest of pedagogy though, I think it's helpful to illustrate how sausages are sometimes made; even if it does make me look like a butcher. Our data now looks like:
Mar-19-00:30:02 0.40 Mar-19-00:31:01 3.54 Mar-19-00:32:01 2.68 ...
Now it's time for the gravy. First I'll give you a taste, and then I'll explain the recipe. Create a file with the following contents, excluding the line numbers. Call it 'plot-load.conf'. Edit the date range on line six to include the extents of your data.
1 set terminal png size 1200,800 2 set xdata time 3 set timefmt "%b-%d-%H:%M:%S" 4 set output "load.png" 5 # time range must be in same format as data file 6 set xrange ["Mar-25-00:00:00":"Mar-26-00:00:00"] 7 set yrange [0:50] 8 set grid 9 set xlabel "Date\\nTime" 10 set ylabel "Load" 11 set title "Load Averages" 12 set key left box 13 plot "load.dat.3" using 1:2 index 0 title "ahost" with lines
If you run the following command, you should end up with a file called 'load.png'. Use your favorite image viewer to take a look. Hopefully nothing too alarming shows up.
cat plot-load.conf | gnuplot
The first line of our gnuplot command file says to create a PNG file, and gives its dimensions. PNG is only one of a myriad possible output formats. The second line says our X axis represents time data. The third line uses standard date format specification (see 'man date') to indicate what our data file's timestamp data looks like. We must use the same format in line six, where we indicate our graph's start time and end time. You can omit this, but I find it's useful to anchor the endpoints, particularly when plotting multiple data sources in a single graph. Line seven sets the plot limits of our Y axis.
Line 13 deserves a little bit of extra attention. The name of our data source comes first. The 'using 1:2' bit means to extract data from columns one and two of our data source. The 'index 0' bit means to use the first data set in the file. Data sets are delimited by pairs of blank records. Our file was simple. It only comprised col1 and col2 of data set zero in the following pseudo data file.
# data set zero col1 col2 col3 col4 col1 col2 col3 col4 col1 col2 col3 col4 # data set one col1 col2 col3 col4 col1 col2 col3 col4 col1 col2 col3 col4 col1 col2 col3 col4 # data set two col1 col2 col3 col4 col1 col2 col3 col4 col1 col2 col3 col4
Asuuming we had multiple data sets in a single file (perhaps we want to compare load averages from multiple hosts), one way we could combine this data into a single graph would be to expand our line 13 as follows:
plot "load.dat.3" using 1:2 index 0 title "ahost" with lines, \ plot "load.dat.3" using 1:2 index 1 title "bhost" with lines, \ plot "load.dat.3" using 1:2 index 2 title "chost" with lines
Potatoes are nice, but as Trotsky once noted, they are "the classic symbol of poverty". Knowing how to quickly whip up some time series plots is useful, but Gnuplot is capable of far more than I've even hinted at in this article. Hopefully I've managed to whet your appetite to learn even more.
Talkback: Discuss this article with The Answer Gang
Ron Peterson is a Network & Systems Manager at Mount Holyoke College in the happy hills of western Massachusetts. He enjoys lecturing his three small children about the maleficent influence of proprietary media codecs while they watch Homestar Runner cartoons together.