A (planned) power cut with annoying consequences and a dead-alike network

Written by - 0 comments

Published on - Listed in Personal Rant Network


To install a power backup device, with the goal to serve current in case of a power outage, the power feed needed to be cut and rewired to pass through this new backup device.

Work on the power feed

During this task, the power feed needed to be cut to my house, as mentioned above. It's annoying, but as long as a power loss can be scheduled and its duration is foreseeable, this is OK.

While the work on the power feed and setup of the backup device went as planned, some other things didn't. To my big surprise, I have to admit.

OpenWRT router stopped routing

Obviously one of the first things I recognized once all the devices started up again was a missing link to the Internet. Even after a few minutes and both router (Fritz!Box 4040 with OpenWRT installed) and fibre modem were happily blinking their LEDs - but still no communication.

Interestingly I was able to get an IP address using DHCP from the OpenWRT router and was able to ping it and even access the LuCI user interface of OpenWRT. However no ping was possible from my (LAN) machine to the Internet. Yet inside LuCI using Network -> Diagnostics, a ping into the Internet (here 1.1.1.1 as destination) happily responded.

Which means: 

  • The router is able to ping something in the Internet, using its WAN interface
  • Hence the modem (connected on WAN) is also working
  • The request coming from my machine (LAN) is not forwarded/routed to the Internet (WAN)

For an unknown reason, all requests coming from LAN, passing through the router (default gateway) were unable to access the Internet. Did the router stop routing from LAN to the next hop in WAN? Was some kind of NAT setup broken? The existing OpenWRT configuration was successfully running for several months, giving me even more question marks.

As I could not spot any configuration problem in OpenWRT, I decided to try a factory reset of the settings, to rule out a hardware problem. This can be done using System ->  Backup / Flash Firmware.

openwrt reset to defaults

The "Perform reset" button clears all configuration and resets OpenWRT to its factory/default settings.

Once the configuration reset was completed, access to Internet immediately worked again!

At least with this I could rule out a hardware defect, but I needed to manually reconfigure the router, it's network address, DHCP ranges and port forwardings.

Note: I also tried to restore a configuration backup, with the very interesting fact that access to Internet was broken again. This looks as if there was some kind of broken configuration active prior to the router reboot - and only broke once rebooted. If someone from OpenWRT wants to dig deeper into this, I still have the configuration backup.

Once access to the Internet worked again, I ran into the next unexpected problem.

Overheating office switch

Next up was the network printer which could not be communicated with anymore - inside the otherwise working LAN. The printer (Brother MFC-9330CDW) was properly shut down prior to the power cut to avoid the printer running into a bad situation (e.g. printing while power is cut). Yet, once started up, no network connectivity! No ping, no response at all.

After verifying the RJ45 cables and following them, I was led to the office switch. A small but reliable (so far) unmanaged 8 port Gigabit switch from Zyxel (product GS-108B).

After touching the switch, I almost burnt my hand. The switch was so hot, I couldn't even lift it. I immediately unplugged the power and let it cool down. Once it cooled down to a "normal" temperature, I connected everything again and - voilĂ  - connectivity to the printer is working again.

Was it a one-off? Unfortunately not! A few hours later, the switch's temperature rose again and the switch would stop working once more (this time also cutting the network connection of my main workstation). The power-cut was definitely not well received by this switch - and is marked as defective now.

But we're not done yet.

LAB server not booting

One of the reasons to install a power backup device for the house is my server room and Infiniroot's lab environment. All servers and computers were powered down prior to the power-cut. Once the power came back, the primary lab server started on its own (as configured in BIOS) - yet network wise the lab server was not reachable.

While I was still struggling with the first two mentioned problems, I thought the problem with the lab server being unreachable related to them. But after fixing the first two, the lab server was still down - network wise.

A display screen connected to the server revealed why:

Windows Boot Manager? 

This server runs on Debian, hence I expected grub to be started and hanging there, in case of a problem. Yet here we are with a MS Windows trying to boot.

This turned out to be an error from me. Or rather I forgot something. After having replaced a SSD on this lab server (which runs mdadm RAID), I forgot to clear the boot sectors of the replacement SSD. This one still contained data from a previous Windows installation. As the EFI/BIOS detected a (somewhat) working EFI partition containing data from a Windows Boot Manager, it tried to load this as first boot option.

After changing the first boot option to boot from another SSD, the Debian OS was successfully started and the SSD in question cleared and added back into the mdadm RAID mirror.

Internet IPv4 change

In the past 16 months I've had the same public IPv4 address - a de facto static IP address. It's no wonder that (public) services hosted in our lab environment point to this IPv4. Although we also use Dynamic DNS records, there are still some hard-wired configurations such as VPN tunnels, firewall whitelisting entries or monitoring configurations.

This required quite some effort to get this going again. One annoying thing I ran into was a VPN tunnel using WireGuard. On the remote end point I was able to set the new IP (or the Dynamic DNS entry) but whenever I restarted the service (using systemctl restart wg-quick@wg0), the endpoint IP was reset to the previous IP address.

This turns out to be due to the SaveConfig option, which is enabled by default. A fix and an explanation for this can be found on this Serverfault answer.

That was definitely not my day. However all problems are now resolved. At least for now...


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder