A (planned) power cut with annoying consequences and a dead-alike network

Written by Claudio Kuenzler - 0 comments

Published on November 27th 2024 - Listed in Personal Rant Network

To install a power backup device, with the goal to serve current in case of a power outage, the power feed needed to be cut and rewired to pass through this new backup device.

During this task, the power feed needed to be cut to my house, as mentioned above. It's annoying, but as long as a power loss can be scheduled and its duration is foreseeable, this is OK.

While the work on the power feed and setup of the backup device went as planned, some other things didn't. To my big surprise, I have to admit.

OpenWRT router stopped routing

Obviously one of the first things I recognized once all the devices started up again was a missing link to the Internet. Even after a few minutes and both router (Fritz!Box 4040 with OpenWRT installed) and fibre modem were happily blinking their LEDs - but still no communication.

Interestingly I was able to get an IP address using DHCP from the OpenWRT router and was able to ping it and even access the LuCI user interface of OpenWRT. However no ping was possible from my (LAN) machine to the Internet. Yet inside LuCI using Network -> Diagnostics, a ping into the Internet (here 1.1.1.1 as destination) happily responded.

Which means:

The router is able to ping something in the Internet, using its WAN interface
Hence the modem (connected on WAN) is also working
The request coming from my machine (LAN) is not forwarded/routed to the Internet (WAN)

For an unknown reason, all requests coming from LAN, passing through the router (default gateway) were unable to access the Internet. Did the router stop routing from LAN to the next hop in WAN? Was some kind of NAT setup broken? The existing OpenWRT configuration was successfully running for several months, giving me even more question marks.

As I could not spot any configuration problem in OpenWRT, I decided to try a factory reset of the settings, to rule out a hardware problem. This can be done using System -> Backup / Flash Firmware.

The "Perform reset" button clears all configuration and resets OpenWRT to its factory/default settings.

Once the configuration reset was completed, access to Internet immediately worked again!

At least with this I could rule out a hardware defect, but I needed to manually reconfigure the router, it's network address, DHCP ranges and port forwardings.

Note: I also tried to restore a configuration backup, with the very interesting fact that access to Internet was broken again. This looks as if there was some kind of broken configuration active prior to the router reboot - and only broke once rebooted. If someone from OpenWRT wants to dig deeper into this, I still have the configuration backup.

Once access to the Internet worked again, I ran into the next unexpected problem.

Overheating office switch

Next up was the network printer which could not be communicated with anymore - inside the otherwise working LAN. The printer (Brother MFC-9330CDW) was properly shut down prior to the power cut to avoid the printer running into a bad situation (e.g. printing while power is cut). Yet, once started up, no network connectivity! No ping, no response at all.

After verifying the RJ45 cables and following them, I was led to the office switch. A small but reliable (so far) unmanaged 8 port Gigabit switch from Zyxel (product GS-108B).

Zyxel GS-108B high temperature after power loss

After touching the switch, I almost burnt my hand. The switch was so hot, I couldn't even lift it. I immediately unplugged the power and let it cool down. Once it cooled down to a "normal" temperature, I connected everything again and - voilà - connectivity to the printer is working again.

Was it a one-off? Unfortunately not! A few hours later, the switch's temperature rose again and the switch would stop working once more (this time also cutting the network connection of my main workstation). The power-cut was definitely not well received by this switch - and is marked as defective now.

But we're not done yet.

LAB server not booting

One of the reasons to install a power backup device for the house is my server room and Infiniroot's lab environment. All servers and computers were powered down prior to the power-cut. Once the power came back, the primary lab server started on its own (as configured in BIOS) - yet network wise the lab server was not reachable.

While I was still struggling with the first two mentioned problems, I thought the problem with the lab server being unreachable related to them. But after fixing the first two, the lab server was still down - network wise.

A display screen connected to the server revealed why:

Windows Boot Manager?

This server runs on Debian, hence I expected grub to be started and hanging there, in case of a problem. Yet here we are with a MS Windows trying to boot.

This turned out to be an error from me. Or rather I forgot something. After having replaced a SSD on this lab server (which runs mdadm RAID), I forgot to clear the boot sectors of the replacement SSD. This one still contained data from a previous Windows installation. As the EFI/BIOS detected a (somewhat) working EFI partition containing data from a Windows Boot Manager, it tried to load this as first boot option.

After changing the first boot option to boot from another SSD, the Debian OS was successfully started and the SSD in question cleared and added back into the mdadm RAID mirror.

Internet IPv4 change

In the past 16 months I've had the same public IPv4 address - a de facto static IP address. It's no wonder that (public) services hosted in our lab environment point to this IPv4. Although we also use Dynamic DNS records, there are still some hard-wired configurations such as VPN tunnels, firewall whitelisting entries or monitoring configurations.

This required quite some effort to get this going again. One annoying thing I ran into was a VPN tunnel using WireGuard. On the remote end point I was able to set the new IP (or the Dynamic DNS entry) but whenever I restarted the service (using systemctl restart wg-quick@wg0), the endpoint IP was reset to the previous IP address.

This turns out to be due to the SaveConfig option, which is enabled by default. A fix and an explanation for this can be found on this Serverfault answer.

That was definitely not my day. However all problems are now resolved. At least for now...

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

Blog Tags:

AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PHP Perl Personal PostgreSQL PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder