Host object in Nagios marked as down, check_ping shows network unreachable

Written by - 0 comments

Published on - Listed in Nagios Monitoring Network


On a Nagios 4.x installation, a particular host was marked as DOWN.

Yet the remote host was perfectly up and working. Why would Nagios think that the host is down though?

Taking a closer look at the host definition showed the following configuration:

define host{
        use                     remote-host
        hostgroups              group1,group2
        host_name               www.example.com
}

Note: www.example.com is obviously an anonymized place holder.

If you've been a long time Nagios user, the first thing which catches your eye is the missing "address" in this host definition. The official Nagios object definition documentation clearly marks the address field as required keyword:

Nagios host definition

However, there's a catch with the address field: It can actually be omitted. In this situation, the host_name will be used as address. The same documentation mentions:

Note: If you do not specify an address directive in a host definition, the name of the host will be used as its address. A word of caution about doing this, however - if DNS fails, most of your service checks will fail because the plugins will be unable to resolve the host name.

That means that www.example.com will be used as the host's address. To manually check what happens in the background of a host check, we need to look at the "remote-host" template:

# remote-host template
define host{
  name                  remote-host    ; The name of this host template
  use                   generic-host    ; This template inherits other values from the generic-host template
  check_period          24x7            ; By default, Linux hosts are checked round the clock
  check_interval        3               ; Actively check the host every 5 minutes
  retry_interval        1               ; Schedule host check retries at 1 minute intervals
  max_check_attempts    2              ; Check each Linux host 10 times (max)
  check_command         check-host-alive ; Default command to check Linux hosts
  notification_period   24x7       ; Linux admins hate to be woken up, so we only notify during the day
[...]
  register              0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

The check_command of this host (template) object shows the check-host-alive. Now let's find out what this command is doing by taking a look at the command definition:

# 'check-host-alive' command definition
define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}

The check-host-alive uses the check_ping monitoring plugin in the background to determine whether the remote host is up or down. By executing the plugin manually, we should obtain the same result as Nagios shows in the user interface:

root@nagios:~# /usr/lib/nagios/plugins/check_ping -H www.example.com -w 3000.0,80% -c 5000.0,100% -p 5
CRITICAL - Network Unreachable (www.example.com)

Indeed, the same "Network Unreachable" error is shown. But why does this happen? A DNS resolving of the target name shows why: The target (www.example.com) can be resolved to both IPv4 and IPv6 addresses. Today's servers are often using IPv6 by default, unless otherwise configured or completely disabled.

To enforce a communication with IPv4, the check_ping plugin can be told to use IPv4 with the -4 parameter:

root@nagios:~# /usr/lib/nagios/plugins/check_ping -H www.example.com -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 1.16 ms|rta=1.160000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0

The ping now responds enforcing IPv4. 

The solution for Nagios in this situation? Either add the IPv4 address in the host definition or adjust the check-host-alive command (append -4 to the command_line).


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.