Posted on

nrpe or nsca alternative for nagios checks

Don’t like running nrpe or nsca but do like Ansible’s approach?
Don’t like receiving around 20 Nagios alerts from every server when there is an issue such as a network outage?

Take the commands from nrpe.cfg that form each Nagios check and put them in to a shell script:
Put them in order of the most important or most useful to know first.
eg

CHECK="My process";
/usr/local/nagios/libexec/check_process -n myprocess
if [[ $? -ne 0 ]]; then
ssh mynagiosserver "echo -e 2\|10\|$HOSTNAME \|myprocess is unavailable, this is what needs to be done >/usr/local/nagios/var/remote-checks/$HOSTNAME"
exit
fi

# ... add all your other checks here
# Check disk space on sda1
CHECK="$CHECK Disk space ";
/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/sda1
if [[ $? -ne 0 ]]; then
ssh mynagiosserver "echo -e 2\|10\|$HOSTNAME disk space alert\|Free disk space on sda1 >/usr/local/nagios/var/remote-checks/$HOSTNAME"
exit
fi
## Else all is OK
ssh mynagiosserver "echo -e 0\|10\|OK\|Checks completed OK: $CHECK>/usr/local/nagios/var/remote-checks/$HOSTNAME"

If you prefer, you can build up a string of all the failed checks, rather than stop checking after the first failure:
eg

if [[ $? -ne 0 ]]; then
CHECK="$CHECK Disk space alert"
fi

and at the end of the checks


if [[ $CHECK -ne "" ]]; then
ssh mynagiosserver "echo -e 2\|10\|$HOSTNAME $CHECK\|Free disk space on sda1 >/usr/local/nagios/var/remote-checks/$HOSTNAME"
fi

For the ssh to work, you will need to set up ssh keys. There are plenty of guides, just look for ssh-keygen and ssh-copy-id.

You will want the checks to execute regulary
I put this in the crontab for the nagios user.


# m h dom mon dow command
*/10 * * * * /usr/local/nagios/checks/10minutechecks.sh

You now have a single file per server on your monitoring server

Alternatives to ssh
- using a shared mount point from nfs, samba, etc and copying the check result file there instead of using ssh

Since we already use Nagios, I set up a Nagios check on the server to read the check file.


define service{
use generic-service
host_name myhost
service_description passive checks
check_interval 10
check_command check_passive!myhost
notifications_enabled 1
}

define command {
command_name check_passive
command_line $USER1$/check_passive.pl $ARG1$
}

and create /usr/local/nagios/libexec/check_passive.pl

#!/usr/bin/perl
use Time::Local;
use DateTime;
$num_args = $#ARGV + 1;
if ($num_args != 1) {
print "Specify name of file to check.\n";
exit
}
$file=$ARGV[0];
$datadir = "/usr/local/nagios/var/remote-checks/";
open(my $fh,"<","$datadir$file"); my $epoch_timestamp = (stat($fh))[9]; my $timestamp = localtime($epoch_timestamp); my $diff = -M "$datadir$file"; my $fileage = $diff *24 *60; #print "fileage is $fileage \n"; my @log=<$fh>;
close LOG;
($exitcode,$interval,$checkname,$checkdetail)= split /\|/,$log[0];
#print $exitcode,$interval,$checkname,$checkdetail;
if ($fileage>$interval) {
print "ERROR: No check report received since $timestamp for $file\n";
exit(2);
}
if ($exitcode != 0) {
print "ERROR: $file check $checkname failed, $checkdetail\n";
exit($exitcode);
}
# } # end of for loop
print "$file $checkname - $checkdetail ";
exit(0)

If this approach works well in a trial, you can modify your configuration management tools to automatically create these scripts instead of creating nrpe.cfg.