Posted on

nrpe or nsca alternative for nagios checks

Don’t like running nrpe or nsca but do like Ansible’s approach?
Don’t like receiving around 20 Nagios alerts from every server when there is an issue such as a network outage?

Take the commands from nrpe.cfg that form each Nagios check and put them in to a shell script:
Put them in order of the most important or most useful to know first.
eg

CHECK="My process";
/usr/local/nagios/libexec/check_process -n myprocess
if [[ $? -ne 0 ]]; then
ssh mynagiosserver "echo -e 2\|10\|$HOSTNAME \|myprocess is unavailable, this is what needs to be done >/usr/local/nagios/var/remote-checks/$HOSTNAME"
exit
fi

# ... add all your other checks here
# Check disk space on sda1
CHECK="$CHECK Disk space ";
/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/sda1
if [[ $? -ne 0 ]]; then
ssh mynagiosserver "echo -e 2\|10\|$HOSTNAME disk space alert\|Free disk space on sda1 >/usr/local/nagios/var/remote-checks/$HOSTNAME"
exit
fi
## Else all is OK
ssh mynagiosserver "echo -e 0\|10\|OK\|Checks completed OK: $CHECK>/usr/local/nagios/var/remote-checks/$HOSTNAME"

If you prefer, you can build up a string of all the failed checks, rather than stop checking after the first failure:
eg

if [[ $? -ne 0 ]]; then
CHECK="$CHECK Disk space alert"
fi

and at the end of the checks


if [[ $CHECK -ne "" ]]; then
ssh mynagiosserver "echo -e 2\|10\|$HOSTNAME $CHECK\|Free disk space on sda1 >/usr/local/nagios/var/remote-checks/$HOSTNAME"
fi

For the ssh to work, you will need to set up ssh keys. There are plenty of guides, just look for ssh-keygen and ssh-copy-id.

You will want the checks to execute regulary
I put this in the crontab for the nagios user.


# m h dom mon dow command
*/10 * * * * /usr/local/nagios/checks/10minutechecks.sh

You now have a single file per server on your monitoring server

Alternatives to ssh
- using a shared mount point from nfs, samba, etc and copying the check result file there instead of using ssh

Since we already use Nagios, I set up a Nagios check on the server to read the check file.


define service{
use generic-service
host_name myhost
service_description passive checks
check_interval 10
check_command check_passive!myhost
notifications_enabled 1
}

define command {
command_name check_passive
command_line $USER1$/check_passive.pl $ARG1$
}

and create /usr/local/nagios/libexec/check_passive.pl

#!/usr/bin/perl
use Time::Local;
use DateTime;
$num_args = $#ARGV + 1;
if ($num_args != 1) {
print "Specify name of file to check.\n";
exit
}
$file=$ARGV[0];
$datadir = "/usr/local/nagios/var/remote-checks/";
open(my $fh,"<","$datadir$file"); my $epoch_timestamp = (stat($fh))[9]; my $timestamp = localtime($epoch_timestamp); my $diff = -M "$datadir$file"; my $fileage = $diff *24 *60; #print "fileage is $fileage \n"; my @log=<$fh>;
close LOG;
($exitcode,$interval,$checkname,$checkdetail)= split /\|/,$log[0];
#print $exitcode,$interval,$checkname,$checkdetail;
if ($fileage>$interval) {
print "ERROR: No check report received since $timestamp for $file\n";
exit(2);
}
if ($exitcode != 0) {
print "ERROR: $file check $checkname failed, $checkdetail\n";
exit($exitcode);
}
# } # end of for loop
print "$file $checkname - $checkdetail ";
exit(0)

If this approach works well in a trial, you can modify your configuration management tools to automatically create these scripts instead of creating nrpe.cfg.

Posted on

watching logs with perl

When you have a log file with lots of unpredictably formatted entries, it can be difficult to come up with a nice grok filter to parse it. This is frustrating if you only want a small amount of data from a very big log.
Below is a quickly written Perl script to watch a log file and print out a summary of the number of times a GET request, indicated with ‘q=’, was made for each 10 minute interval.
This gave me a nice summary.log which looked like this:

hits-per-10m=35,server=myservername,logdate=201701221032,year=2017,month=01,date=22,hour=10,minute=32,second=45


#!/usr/bin/perl
# Watch a log and write key-value pairs to a file

use File::Tail;
use URI::Escape ;

$file = File::Tail->new("/var/log/myfile.log");
%months = qw( Jan 1 Feb 2 Mar 3 Apr 4 May 5 Jun 6 Jul 7 Aug 8 Sep 9 Oct 10 Nov 11 Dec 12);
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime();
$year=$year+1900;
$hold = int($min / 10);
$count=0;
$lasthour = 99;

while (defined(my $line= $file->read)) {
$line=~s/\s+/ /g;
@parts= split /[ ]/,$line;
# in my log, I could parse the values from the data line as follows
$server=$parts[3];
$month=$months{$parts[0]};
$date=$parts[1];
($hour,$min,$sec) = split /\:/, $parts[2];
($ipaddr,$port) = split /\:/,$parts[5];
# here I want to count the number of values for each 10 minute interval
# the problem with logs is that there may not be a log line for every minute
## so I use $hold to indicate whether to hold the data or write it to a file
## And in case the hour has changed since the last log entry, this is also checked
if ($lasthour != $hour) {
$hold = int($min / 10);
}
$lasthour = $hour;
if ( ($min >= 0) && ($min <=9) ){
if ($hold == 0 ) {
&printCount ;
$hold = 1;
}
}
if ( ($min >= 10) && ($min <=19) ){
if ($hold <= 1 ) {
&printCount ;
$hold = 2;
}
}
if ( ($min >= 20) && ($min <=29) ){
if ($hold <= 2 ) {
&printCount ;
$hold = 3;
}
}
if ( ($min >= 30) && ($min <=39) ){
if ($hold <= 3 ) {
&printCount ;
$hold = 4;
}
}
if ( ($min >= 40) && ($min <=49) ){
if ($hold <= 4 ) {
&printCount ;
$hold = 5;
}
}
if ( ($min >= 50) && ($min <=59) ){
if ($hold <= 5 ) {
&printCount ;
$hold = 6;
}
}
$count++;
$lastmin = $min;
# check if the log entry has the value you are counting
# in this case I was looking for the search term which followed 'q='
if ($line =~ /GET/) {
($junk,$q) = split /[[\&\?]q=/, $line;
($q, $junk) = split /[\& ]/, $q;
$q = uri_unescape($q);
# remove any commas because I will be using them as separators
$q =~s/\,/ /g;
}
} # end while

sub printCount {
# print the time and the count of searches for the 10 minute period
open (OUT,">>/var/log/summary.log");
print OUT "hits-per-10m=$count,server=$server,logdate=$year$month$date$hour$min,year=$year,month=$month,date=$date,hour=$hour,minute=$min,second=$sec\n";
close OUT;
$count = 0;
}