Keepalived virtual ip addresses gone lost after systemd update

Written by - 0 comments

Published on May 12th 2020 - Listed in Linux SystemD Network


After an Ansible playbook was run on a load balancer system, running with keepalived and additional virtual ip addresses (VIPs), all of a sudden all VIPs were gone and unreachable. How is that possible? What exactly did the playbook do? Not much, it turned out. The Ansible playbook did what it was supposed to and made some base configurations (applied on all servers) and at the end ran a system update using (safe-) upgrade. And right after this playbook task, the VIPs were gone. Down.

Interestingly there was no failover to the secondary loadbalancer - which would indicate that keepalived's vrrp communication still worked.

Logs point to systemd

By checking dmesg on the affected load balancer, something interesting showed up in the events: systemd!

[Tue May 12 13:51:06 2020] systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[Tue May 12 13:51:06 2020] systemd[1]: Detected virtualization vmware.
[Tue May 12 13:51:06 2020] systemd[1]: Detected architecture x86-64.
[Tue May 12 13:51:06 2020] systemd[1]: Stopping Journal Service...
[Tue May 12 13:51:06 2020] systemd-journald[10688]: Received SIGTERM from PID 1 (systemd).
[Tue May 12 13:51:06 2020] systemd[1]: Stopped Journal Service.
[Tue May 12 13:51:06 2020] systemd[1]: Starting Journal Service...
[Tue May 12 13:51:06 2020] systemd[1]: Started Journal Service.

Was the systemd package updated? A verfication in apt's history logs confirmed:

root@loadbalancer1:~# cat /var/log/apt/history.log

Start-Date: 2020-05-12  13:50:58
Requested-By: ansible (1001)
Install: linux-modules-extra-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic), linux-headers-4.15.0-99:amd64 (4.15.0-99.100, automatic), linux-modules-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic), linux-headers-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic), linux-image-4.15.0-99-generic:amd64 (4.15.0-99.100, automatic)
Upgrade: linux-headers-generic:amd64 (4.15.0.96.87, 4.15.0.99.89), python-samba:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), libldap-2.4-2:amd64 (2.4.45+dfsg-1ubuntu1.4, 2.4.45+dfsg-1ubuntu1.5), libwbclient0:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), libsystemd0:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), linux-image-generic:amd64 (4.15.0.96.87, 4.15.0.99.89), udev:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), libudev1:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), samba-libs:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), samba-common:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), systemd-sysv:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), libldap-common:amd64 (2.4.45+dfsg-1ubuntu1.4, 2.4.45+dfsg-1ubuntu1.5), libpam-systemd:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), systemd:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), libsmbclient:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), smbclient:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), samba-common-bin:amd64 (2:4.7.6+dfsg~ubuntu-0ubuntu2.15, 2:4.7.6+dfsg~ubuntu-0ubuntu2.16), sosreport:amd64 (3.9-1ubuntu0.18.04.2, 3.9-1ubuntu0.18.04.3), libmysqlclient20:amd64 (5.7.29-0ubuntu0.18.04.1, 5.7.30-0ubuntu0.18.04.1), libnss-systemd:amd64 (237-3ubuntu10.39, 237-3ubuntu10.40), linux-firmware:amd64 (1.173.17, 1.173.18), linux-generic:amd64 (4.15.0.96.87, 4.15.0.99.89)
Remove: linux-modules-extra-4.15.0-74-generic:amd64 (4.15.0-74.84)
End-Date: 2020-05-12  13:52:51

From the gathered information it looked as if systemd (or its relevant udev part) would have wiped off the VIPs from the system.

Confirming the theory

As there are a couple of such setups around, it did not take long to find a similar loadbalancer in the same state and in a test environment to reproduce this issue. Maybe a manual package update would reveal more information, too?

Before the packages were updated, the current versions were gathered:

root@anotherlb:~# apt-show-versions -u | grep systemd
libnss-systemd:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
libpam-systemd:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
libsystemd0:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
systemd:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40
systemd-sysv:amd64/bionic-updates 237-3ubuntu10.33 upgradeable to 237-3ubuntu10.40

And the VIPs were listed and showing up just fine in ip a output:

root@anotherlb:~# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens192: mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8d:fb:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.22.141/25 brd 194.40.216.255 scope global ens192
       valid_lft forever preferred_lft forever
    inet 192.168.22.140/32 scope global ens192       <<<< this is the VIP
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8d:fb45/64 scope link
       valid_lft forever preferred_lft forever

Let's do the package update, by only selecting the systemd packages:

root@anotherlb:~# apt-get install systemd systemd-sysv
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-headers-4.15.0-62 linux-headers-4.15.0-62-generic linux-image-4.15.0-62-generic linux-modules-4.15.0-62-generic
  linux-modules-extra-4.15.0-62-generic
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  libnss-systemd libpam-systemd libsystemd0
Suggested packages:
  systemd-container
The following packages will be upgraded:
  libnss-systemd libpam-systemd libsystemd0 systemd systemd-sysv
5 upgraded, 0 newly installed, 0 to remove and 83 not upgraded.
Need to get 3,346 kB of archives.
After this operation, 57.3 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libnss-systemd amd64 237-3ubuntu10.40 [104 kB]
Get:2 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 systemd-sysv amd64 237-3ubuntu10.40 [14.4 kB]
Get:3 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpam-systemd amd64 237-3ubuntu10.40 [107 kB]
Get:4 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 systemd amd64 237-3ubuntu10.40 [2,913 kB]                                               
Get:5 http://ch.archive.ubuntu.com/ubuntu bionic-updates/main amd64 libsystemd0 amd64 237-3ubuntu10.40 [207 kB]                                             
Fetched 3,346 kB in 9s (392 kB/s)                                                                                                                           
(Reading database ... 141281 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_237-3ubuntu10.40_amd64.deb ...
Unpacking libnss-systemd:amd64 (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../systemd-sysv_237-3ubuntu10.40_amd64.deb ...
Unpacking systemd-sysv (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../libpam-systemd_237-3ubuntu10.40_amd64.deb ...
Unpacking libpam-systemd:amd64 (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../systemd_237-3ubuntu10.40_amd64.deb ...
Unpacking systemd (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Preparing to unpack .../libsystemd0_237-3ubuntu10.40_amd64.deb ...
Unpacking libsystemd0:amd64 (237-3ubuntu10.40) over (237-3ubuntu10.33) ...
Setting up libsystemd0:amd64 (237-3ubuntu10.40) ...
Setting up systemd (237-3ubuntu10.40) ...
Failed to try-restart systemd-resolved.service: Unit systemd-resolved.service is masked.
Setting up libnss-systemd:amd64 (237-3ubuntu10.40) ...
Setting up systemd-sysv (237-3ubuntu10.40) ...
Setting up libpam-systemd:amd64 (237-3ubuntu10.40) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for dbus (1.12.2-1ubuntu1.1) ...
Processing triggers for ureadahead (0.100.0-21) ...

Right after the step of "setting up systemd", the systemd-resolved service was attempted to be restarted. This service, however, is on this particular machine masked (on purpose!).

root@anotherlb:~# systemctl list-unit-files | grep resolved
systemd-resolved.service                                         masked

The big question is now: Is the masked systemd-resolved unit to blame or would this happen anyway after a systemd and udev restart?

Peeking into systemd Ubuntu package

The current systemd package for Ubuntu Bionic (18.04) can be downloaded and the extracted systemd_237-3ubuntu10.40.debian.tar.xz showed an interesting part in the systemd.postinst file on line 42:

 42 # Enable resolved by default on new installs installs and upgrades
 43 if dpkg --compare-versions "$2" lt "234-1ubuntu2~"; then
 44     systemctl enable systemd-resolved.service || true
 45 fi

and later on:

149 # skip daemon-reexec and try-restarts during shutdown to avoid hitting LP: #1803391
150 if [ -n "$2" ] && [ "$(systemctl is-system-running)" != "stopping" ]; then
151     _systemctl daemon-reexec || true
152     # don't restart logind; this can be done again once this gets implemented:
153     # https://github.com/systemd/systemd/issues/1163
154     _systemctl try-restart systemd-networkd.service || true
155     _systemctl try-restart systemd-resolved.service || true
156     _systemctl try-restart systemd-timesyncd.service || true
157     _systemctl try-restart systemd-journald.service || true
158 fi

So basically systemd tries to enable and then later restart (try-restart) the systemd-resolved service. If it fails it simply returns true. This can be manually executed to see the behaviour:

root@anotherlb:~# systemctl try-restart systemd-resolved.service || true
Failed to try-restart systemd-resolved.service: Unit systemd-resolved.service is masked.
root@anotherlb:~# echo $?
0

Even though systemd-resolved could not be restarted (because the service is masked), the exit code is 0. Which means for the package installation/upgrade: All good, continue.

So far so good. So it must be something else. What about the service just before, systemd-networkd? This is an enabled service, so this should work out of the box, right?

root@anotherlb:~# systemctl try-restart systemd-networkd.service || true

As soon as this command was fired, the VIPs were gone again (continuously being pinged by another terminal session)! ip a confirmed the VIP is gone:

root@anotherlb:~# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens192: mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8d:fb:45 brd ff:ff:ff:ff:ff:ff
    inet 192.168.22.141/25 brd 194.40.216.255 scope global ens192
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8d:fb45/64 scope link
       valid_lft forever preferred_lft forever

It's a systemd bug

Now with that information at hand I wanted to report an Ubuntu bug. But it turns out: The bug already exists: LP Bug #1815101! The bug was confirmed in February 2020 but so far no fix is available.

Workaround: Restart keepalived right after a systemd update and the VIPs are back again.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.