Keepalived is a neat software to configure and manage floating IP addresses, also known as VIP (virtual ip) using the VRRP protocol. I've been using Keepalived for a decade and have rarely run into issues. But recently I experienced a very strange case on RHEL 9 systems.
The Keepalived package can be installed directly from the package repositories:
ck@rhel9 ~$ sudo dnf install keepalived
The package installs a default/example configuration file under /etc/keepalived/keepalived.conf.
After I adjusted the configuration file (adjusted vrrp_instance, removed virtual_server config snippets, added track scripts) and started Keepalived, the VIP (192.168.11.82) was correctly added as virtual IP on the ens192 NIC:
ck@rhel9 ~$ sudo systemctl start keepalived
ck@rhel9 ~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:be:05:d6 brd ff:ff:ff:ff:ff:ff
altname enp11s0
inet 192.168.10.227/23 brd 192.168.10.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 192.168.11.82/32 scope global ens192
valid_lft forever preferred_lft forever
However when I tried to ping the VIP from another machine, there was no response.
ck@anothermachine ~$ ping 192.168.11.82
PING 192.168.11.82 (192.168.11.82) 56(84) bytes of data.
^C
--- 192.168.11.82 ping statistics ---
77 packets transmitted, 0 received, 100% packet loss, time 77850ms
To make this even more interesting, a tcpdump on the Keepalived master server showed the incoming ping - but also showed proof that no response was sent.
ck@rhel9 ~$ sudo tcpdump -nn -i any host 192.168.11.82
[...]
11:11:44.130697 ens192 In IP 192.168.10.192 > 192.168.11.82: ICMP echo request, id 28, seq 1, length 64
11:11:45.163493 ens192 In IP 192.168.10.192 > 192.168.11.82: ICMP echo request, id 28, seq 2, length 64
11:11:46.187492 ens192 In IP 192.168.10.192 > 192.168.11.82: ICMP echo request, id 28, seq 3, length 64
11:11:47.211438 ens192 In IP 192.168.10.192 > 192.168.11.82: ICMP echo request, id 28, seq 4, length 64
[...]
What's going on here?
First I expected a routing or arp issue. Maybe the network itself was routing the VIP to a wrong destination. But after stopping Keepalived on both master and backup servers, I manually configured the VIP:
ck@rhel9 ~$ sudo ip addr add 192.168.11.82/32 dev ens192
There was literally no difference in the way the VIP was configured on ens192:
ck@rhel9 ~$ ip a
[...]
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:be:05:d6 brd ff:ff:ff:ff:ff:ff
altname enp11s0
inet 192.168.10.227/23 brd 192.168.10.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 192.168.11.82/32 scope global ens192
valid_lft forever preferred_lft forever
But this time the ping worked from the network; the VIP responded!
A problem with a central firewall or even with the Linux stack could be ruled out. The problem must be within Keepalived. But what exactly?
I removed the manually added VIP and started Keepalived again. The VIP was automatically configured by Keepalived - and again did not respond to any connection attempts.
This time I shifted focus on the Keepalived logs in syslog:
ck@rhel9 ~$ tail -f /var/log/messages
[...]
Jun 5 11:11:27 rhel9 Keepalived_vrrp[9825]: VRRP_Script(chk_tinyproxy) succeeded
Jun 5 11:11:27 rhel9 Keepalived_vrrp[9825]: (VI_1) Changing effective priority from 100 to 103
Jun 5 11:11:31 rhel9 Keepalived_vrrp[9825]: (VI_1) Receive advertisement timeout
Jun 5 11:11:31 rhel9 Keepalived_vrrp[9825]: (VI_1) Entering MASTER STATE
Jun 5 11:11:31 rhel9 Keepalived_vrrp[9825]: (VI_1) setting firewall drop rule
Jun 5 11:11:31 rhel9 Keepalived_vrrp[9825]: (VI_1) setting VIPs.
[...]
One line caught my attention: setting firewall drop rule. I double-checked with all my Keepalived setups on Debian systems and this log entry never ever showed up. What's this? Where is this coming from?
As it turns out, this log entry (and a relevant dynamic firewall block rule) is caused by the vrrp_strict option. The comment of Quentin Armitage perfectly describes what's happening here:
vrrp_strict means Enforce strict VRRP protocol compliance. RFC5798 states (for a VRRP instance in master state): (650) - MUST accept packets addressed to the IPvX address(es) associated with the virtual router if it is the IPvX address owner or if Accept_Mode is True. Otherwise, MUST NOT accept these packets.
You have enabled vrrp_strict and accept is not set and the vrrp instance is not the address owner (priority not 255), so packets addressed to the virtual IP addresses are dropped. keepalived currently uses the firewall (iptables or nftables) to drop the packets.
After quickly verifying the keepalived configuration, this option is indeed part of the config file:
ck@rhel9 ~$ sudo head /etc/keepalived/keepalived.conf
! Configuration file for keepalived
global_defs {
router_id rhel9
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
However the example configuration installed by the package also shows the following vrrp_instance example snippet:
ck@rhel9 ~$ cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.200.16
192.168.200.17
192.168.200.18
}
}
[...]
Looking at the priority (100) here, this configuration indeed leads to problems when vrrp_strict is enabled (which is in the example config).
After I removed vrrp_strict from the KeepaliveD config and restarted keepalived, the VIP now responded!
After comparing the default configuration file with the one in the Debian package, it turns out that the same example configuration file is used. The keepalived.conf in the RPM and DEB packages gives a helpful overview of configuration possibilities. But when you use this example configuration 1:1 and just modify your IP addresses, you will run into that same problem as I did.
It's better to clear the default configuration file and add your own configuration snippets into it.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder