Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Icinga2 graphing with InfluxDB and Grafana
Friday - Dec 8th 2017 - by - (0 comments)

As you can imagine from my previous post (icinga2-classicui shows Whoops error on new Icinga 2.8 install) I'm currently setting up a renewed monitoring architecture running on Ubuntu 16.04 Xenial.

To my big surprise PNP4Nagios doesn't run anymore on Xenial because Xenial ships with PHP 7.0 but PNP4Nagios is now running with PHP 7.0 (see this issue on github). There are surely ways around it like installing special PHP repositories to install older PHP versions, but I decided to test another graphing method: Grafana as user interface (grapher) and InfluxDB as database. 

Note in advance: Getting your graphs shown in Grafana certainly requires much more work than with PNP4Nagios, especially if you heavily rely on nrpe checks like me.

Part I: Preparation of InfluxDB

Let's install the influxdb right from the Ubuntu repos:

root@inf-mon02-t:~# apt-get install influxdb influxdb-client

After the installation, influxdb's daemon (influxd) is launched automatically and is listening:

root@inf-mon02-t:~# netstat -lntup | grep influx
tcp        0      0 127.0.0.1:8091          0.0.0.0:*               LISTEN      25027/influxd
tcp6       0      0 :::8083                 :::*                    LISTEN      25027/influxd
tcp6       0      0 :::8086                 :::*                    LISTEN      25027/influxd
tcp6       0      0 :::8088                 :::*                    LISTEN      25027/influxd

Data will be stored in /var/lib/influxdb:

root@inf-mon02-t:~# ll /var/lib/influxdb/
total 16
drwxr-xr-x 3 influxdb influxdb 4096 Dec  8 12:11 data
drwx------ 2 influxdb influxdb 4096 Dec  8 12:11 hh
drwxr-xr-x 3 influxdb influxdb 4096 Dec  8 12:11 meta
drwx------ 3 influxdb influxdb 4096 Dec  8 12:11 wal

InfluxDB can now be accessed using the InfluxDB client (influx):

root@inf-mon02-t:~# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.10.0
InfluxDB shell 0.10.0
> SHOW DATABASES
name: databases
---------------
name
_internal

> quit

Let's prepare a database for Icinga2 and create a database user for it:

root@inf-mon02-t:~# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.10.0
InfluxDB shell 0.10.0
> CREATE DATABASE icinga2
> SHOW DATABASES
name: databases
---------------
name
_internal
icinga2

> CREATE USER icinga2 WITH PASSWORD 'mysupersecretpassword'
> GRANT ALL ON icinga2 TO icinga2
> SHOW GRANTS FOR icinga2
database        privilege
icinga2         ALL PRIVILEGES

> quit


Part II: Tell Icinga2 to use InfluxDB for performance data

The following Icinga2 features need to be enabled:

root@inf-mon02-t:~# icinga2 feature enable influxdb
root@inf-mon02-t:~# icinga2 feature enable perfdata

This requires a restart of Icinga2:

root@inf-mon02-t:~# service icinga2 restart

Verify, that the features are really enabled:

root@inf-mon02-t:~# icinga2 feature list
Disabled features: debuglog elasticsearch gelf graphite ido-mysql livestatus opentsdb syslog
Enabled features: api checker command compatlog influxdb mainlog notification perfdata statusdata

In theory that should already be enough. From the official Icinga2 documentation:

"By default the InfluxdbWriter feature expects the InfluxDB daemon to listen at 127.0.0.1 on port 8086."

Important to add a note here: The default database Icinga2 wants to write into is called "icinga2".

Now let's check if InfluxDB is really receiving data from Icinga2's.

Note: InfluxDB doesn't use the term "table" as classical SQL databases. This is called a "measurement" in InfluxDB terminology and is "comparable" to a table. It's my first day with InfluxDB so this is very likely not technically correct, but helpful to understand the query syntax.

root@inf-mon02-t:~# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.10.0
InfluxDB shell 0.10.0
> USE icinga2
Using database icinga2
> SHOW MEASUREMENTS
name: measurements
------------------
name
apt
disk
hostalive
http
icinga
load
ping4
ping6
procs
ssh
swap
users

> SELECT * FROM ping4
name: ping4
-----------
time                    hostname        metric  service value
1512732076000000000     inf-jira01-t    pl      ping4   0
1512732076000000000     inf-jira01-t    rta     ping4   0.003722
1512732076000000000     inf-mon02-t     pl      ping4   0
1512732076000000000     inf-mon02-t     rta     ping4   4.1e-05
1512732138000000000     inf-mon02-t     rta     ping4   4.6e-05
1512732138000000000     inf-jira01-t    rta     ping4   0.00366
1512732138000000000     inf-mon02-t     pl      ping4   0
1512732138000000000     inf-jira01-t    pl      ping4   0
1512732200000000000     inf-mon02-t     pl      ping4   0
1512732200000000000     inf-mon02-t     rta     ping4   5.5e-05
1512732200000000000     inf-jira01-t    rta     ping4   0.003685
1512732200000000000     inf-jira01-t    pl      ping4   0
1512732262000000000     inf-jira01-t    rta     ping4   0.003784
1512732262000000000     inf-mon02-t     pl      ping4   0

> quit

Yes, there's data in there!

But all this collected data is only helpful if we can create graphs from it. Grafana is able to connect to an InfluxDB and create graphs from the data, so let's do this!

 

Part III: The fight to connect Grafana to InfluxDB

The installation of Grafana is very simple as Xenial already has Grafana packages in their repos.

root@inf-mon02-t:~# apt-get install grafana grafana-data

Grafana is listening on port tcp/3000:

root@inf-mon02-t:~# netstat -lntup|grep grafana
tcp6       0      0 :::3000                 :::*                    LISTEN      27579/grafana

This is the listener port of the Grafana-internal webserver. This is how Grafana looks in the browser:

Grafana Login 

The default user and password is admin/admin and the default (and yet empty) dashboard is shown:

Grafana empty dashboard 

To add the InfluxDB as a data source, click on Data Sources -> Add new:

Grafana add data source

Set a name for this new data source. I called it "InfluxDB-Icinga2".
On Ubuntu 16.04 Xenial an InfluxDB 0.9.x is installed, so set "Type" to "InfluxDB 0.9.x". 
In "Http settings" set the access to the InfluxDB's API port, which is "http://localhost:8086" and has a "direct" access. 
Grafana requires given user and password to connect to InfluxDB. Use the icinga2 user which was created above.

Grafana add new data source

Once the data source was addded, I clicked on "Test Connection" and got the following error in Chrome:

Unknown error
InfluxDB Error: Cannot read property 'message' of null

In Firefox I got a slightly different error:

Unknown error
InfluxDB Error: err.data is null

Grafana data source error err data is null 

I wasn't able to find a solution to this, but it led me to believe that a lot of bugs have been fixed since the Grafana version for Xenial (2.6.0) was released.
The current version is 4.6.2 so I decided to install this one. But first, Grafana (from the Ubuntu repos) needs to be uninstalled again:

root@inf-mon02-t:~# apt-get remove grafana grafana-data
root@inf-mon02-t:~# apt-get purge grafana grafana-data
root@inf-mon02-t:~# rm -rf /usr/share/grafana

I downloaded the current release's Debian package and installed it:

root@inf-mon02-t:~# wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_4.6.2_amd64.deb
root@inf-mon02-t:~# sudo dpkg -i grafana_4.6.2_amd64.deb

At the end of the installation of the deb package, the following notes show up:

### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
 sudo /bin/systemctl daemon-reload
 sudo /bin/systemctl enable grafana-server
### You can start grafana-server by executing
 sudo /bin/systemctl start grafana-server

Hmm... could have been placed into the deb package, too. 
So let's do this:

root@inf-mon02-t:~# sudo /bin/systemctl daemon-reload
root@inf-mon02-t:~# sudo /bin/systemctl enable grafana-server
Synchronizing state of grafana-server.service with SysV init with /lib/systemd/systemd-sysv-install...
Executing /lib/systemd/systemd-sysv-install enable grafana-server
Created symlink from /etc/systemd/system/multi-user.target.wants/grafana-server.service to /usr/lib/systemd/system/grafana-server.service.
root@inf-mon02-t:~# sudo /bin/systemctl start grafana-server

And yes, Grafana is started:

root@inf-mon02-t:~# netstat -lntup| grep grafana
tcp6       0      0 :::3000                 :::*                    LISTEN      29922/grafana-serve

Back to the login with the browser. Wow - the interface has really changed a lot:

Grafana 4.6.2 dashboard

The new data source can now be added by clicking on the Grafana icon (top left) and then on "Data Sources".

Grafana menu 

Grafana data source

Oh.. looks like the previously added InfluxDB is still here (because I didn't delete Grafana's own database in /var/lib/grafana before):

I selected this data source and clicked on "Save & Test". Argh. Another error:

Network Error: undefined(undefined)

Grafana data source error network error

Jeez, what now!? Eventually I came across this issue on Github and there user eighteen14k mentioned:

Oh this is just silly.

"Access: direct" - browser attempts to verify "data source"
"Access: proxy" - grafana verifies "data source".

That's the inverse of what I would expect.

What?! Seriously? Well I tried it and set the connection to "proxy" instead of "direct" and boom:

Grafana data source working 

OMG! Seriously, this is, as eighteen14k said it, just silly!

Now that the connection from Grafana to InfluxDB is finally working, a new Dashboard needs to be added in Grafana.

 

Part IV: Create the Icinga2 dashboard

Luckily a dashboard is already prepared and ready to use from the Icinga2 team and can be downloaded here: https://grafana.com/dashboards/381.
The json config from this dashboard (ID 381) can be downloaded here: https://grafana.com/api/dashboards/381/revisions/1/download.
The content can then be pasted directly into Grafana, when importing a dashboard:

Grafana import dashboard 

Grafana import dashboard 

Grafana import dashboard

And fiiiinally, the graphs are showing up:

Grafana Icinga2 Dashboard

Note: For NRPE checks you will have to adapt the graphs because these performance data are stored in the "nrpe" measurements table.

 

icinga2-classicui shows Whoops error on new Icinga 2.8 install
Friday - Dec 8th 2017 - by - (0 comments)

So it seems that icinga2-classicui disappeared from the official installation documentation (see screenshot from the table of contents) and was replaced by icingaweb2:

Icinga 2 documentation lacks classicui

But that doesn't mean that icingaweb2 is forcibly better than the classicui was.

Yes, it has a newer design, reacts faster and offers live data (e.g. time since last status change counts dynamically up). But I still feel "at home" with the classic UI and I find myself working quicker with the classicui. Maybe it's just out of habit, but old habits die hard.

Icinga2 Icingaweb2

Anyway, since the classicui was removed from the documentation, there's something important missing now. When you install icinga2-classicui and you want to launch it in your browser, you see the following error message:

Icinga2 classicui whoops error

Ouch! Erm, I mean Whoops! The reason for this is the webserver user (www-data in Debian/Ubuntu) is unable to read the contents from /var/cache/icinga2/objects.cache. This can be found out if the classicui is configured to do some logging:

root@inf-mon02-t:~# grep log /etc/icinga2-classicui/cgi.cfg
use_logging=1
cgi_log_file=/var/log/icinga2/icinga-cgi.log
cgi_log_rotation_method=d
cgi_log_archive_path=/var/log/icinga/gui
log_file=/var/log/icinga2/compat/icinga.log
log_rotation_method=h
log_archive_path=/var/log/icinga2/compat/archives

Note I enabled "use_logging" and set the log path to /var/log/icinga2/icinga-cgi.log. Make sure www-data (your webserver user) is able to write that file:

root@inf-mon02-t:~# touch /var/log/icinga2/icinga-cgi.log
root@inf-mon02-t:~# chown www-data /var/log/icinga2/icinga-cgi.log

The next time you access the classicui, this error will be logged:

[1512728139] Error: Cannot open config file '/var/cache/icinga2/objects.cache' for reading: Permission denied

As I mentioned above, www-data must be able to read the Icinga2 objects cache file. By default (from the Icinga2 packet installation), the permissions are like this:

root@inf-mon02-t:/var/cache# ll
total 36
drwxr-xr-x 3 root   root   4096 Dec  7 11:57 apache2
drwxr-xr-x 2 root   root   4096 Aug 31 23:30 apparmor
drwxr-xr-x 3 root   root   4096 Dec  7 14:18 apt
drwxr-xr-x 2 root   root   4096 Dec  8 07:25 apt-show-versions
drwxr-xr-x 3 root   root   4096 Dec  7 11:56 dbconfig-common
drwxr-xr-x 2 root   root   4096 Dec  7 14:18 debconf
drwxr-x--- 2 nagios nagios 4096 Dec  8 11:14 icinga2
drwx------ 2 root   root   4096 Dec  7 13:29 ldconfig
drwxr-xr-x 2 root   root   4096 Nov 16 13:39 samba

root@inf-mon02-t:/var/cache# ll icinga2/
total 952
-rw------- 1 nagios nagios 867562 Dec  8 11:04 icinga2.debug
-rw------- 1 nagios nagios   6221 Dec  8 11:04 icinga2.vars
-rw-r--r-- 1 nagios nagios  70613 Dec  8 11:04 objects.cache
-rw-r--r-- 1 nagios nagios  23117 Dec  8 11:15 status.dat

I checked my old documentations for Icinga2 installations and indeed; the part where www-data is assigned to the nagios group (nagios user and group are used by the icinga2 package installation). This is now missing in the official Icinga2 documentation because it is now relevant to icingaweb2. To fix it:

root@inf-mon02-t:~# usermod -a -G nagios www-data
root@inf-mon02-t:~# service apache2 restart

Much better now:

Icinga2 classicui fixed 

 

How to install Dell OpenManage 9.x and racadm on Ubuntu 16.04
Wednesday - Dec 6th 2017 - by - (0 comments)

In the past (a couple of years back) it was somewhat annoying to install Dell's OpenManage software on Linux servers. But Dell obviously has improved the situation by releasing the software through apt repositories.
In short, everything is nicely documented on http://linux.dell.com/repo/community/openmanage/ but I modified some steps and for sake of completeness showing it here, too.

Add the apt repository as a new source:

root@r720:~# sudo echo "deb http://linux.dell.com/repo/community/openmanage/902/xenial xenial main" > /etc/apt/sources.list.d/linux.dell.com.sources.list

Note: At the time of this writing, version 9.0.2 was the newest available version, hence the "902" in the URL.

Add the apt key from the Dell repos:

root@r720:~# sudo gpg --keyserver pool.sks-keyservers.net --recv-key 1285491434D8786F
root@r720:~# gpg -a --export 1285491434D8786F | sudo apt-key add -

Update apt and install srvadmin:

root@r720:~# sudo apt-get update
root@r720:~# sudo apt-get install srvadmin-all

This also installs the racadm tool to access an integrated iDRAC card. As I'm on a Dell PowerEdge R720 server with a iDRAC7 card here, the package srvadmin-idracadm7 is relevant to me.

To check if racadm works correctly and is able to talk to the iDRAC card, launch this command:

root@r720:~# sudo racadm getsysinfo

RAC Information:

RAC Date/Time           = Wed Dec  6 08:37:32 2017

Firmware Version        = 2.50.50.50
Firmware Build          = 33
Last Firmware Update    = 12/05/2017 10:31:09
Hardware Version        = 0.01
MAC Address             = C8:1F:66:01:00:FF

Common settings:
Register DNS RAC Name   = 0
[...]

To be able to communicate with OpenManage, the main service (dataeng) needs to be started:

root@r720:~# sudo service dataeng start

This launches the following processes:

root     26802  4.5  0.0 888880 27376 ?    Ssl  09:39   0:01 /opt/dell/srvadmin/sbin/dsm_sa_datamgrd
root     26897  0.0  0.0 684076 17660 ?    Ss   09:39   0:00  \_ /opt/dell/srvadmin/sbin/dsm_sa_datamgrd
root     26873  0.0  0.0 228992  7756 ?    Ssl  09:39   0:00 /opt/dell/srvadmin/sbin/dsm_sa_eventmgrd
root     26892  0.5  0.0 373844  9688 ?    Ssl  09:39   0:00 /opt/dell/srvadmin/sbin/dsm_sa_snmpd

To test omreport, launch this command:

root@r720:~# sudo /opt/dell/srvadmin/bin/omreport system summary
sh: 1: /bin/rpm: not found
System Summary

------------------
Software Profile
------------------
Systems Management
Name                        : Server Administrator
Version                     : 9.0.2
Description                 : Systems Management Software

Operating System
Name                        : Linux
Version                     : Kernel 4.4.0-101-generic (x86_64)
System Time                 : Wed Dec  6 09:41:48 2017
System Bootup Time          : Tue Dec  5 18:22:29 2017
[...]

Now that omreport works, we can use the monitoring plugin check_openmanage to monitor the server's hardware health (and integrate it in Nagios/Icinga):

root@r720:~# sudo wget http://folk.uio.no/trondham/software/check_openmanage-3.7.12/check_openmanage -O /usr/lib/nagios/plugins/check_openmanage
root@r720:~# sudo chmod 755 /usr/lib/nagios/plugins/check_openmanage

root@r720:~# sudo /usr/lib/nagios/plugins/check_openmanage -a -i
[servicetag] ESM log content: 20 critical, 0 non-critical, 13 ok


 

How to install a custom Android ROM on Sony Xperia Z2
Tuesday - Nov 28th 2017 - by - (0 comments)

The Sony Xperia Z2 phone is still a very good phone, although it has some years on it's back. I bought my Xperia Z2 exactly three years ago, in November 2014. It is still a speedy phone, still looks good and still has a good battery life. But of course there must be something negative: The Android OS on that phone never got over Android 6.0.1 (Marshmallow). That's the latest update my phone ever received from Sony, including security patches (Android security patch level: May 1, 2016). What a shame.

A lot of problems have recently became public, some of them pretty bad. For starters there's the Bluetooth vulnerability (BlueBorne) which lets your phone be hacked within a couple of seconds using simply a Bluetooth connection. Then there's also the recently discovered WPA2 vulnerability in wireless networks. This needs to be fixed on the client side (WLAN client), too. 

So security alone is already one big reason to update your Sony Xperia Z2 to a newer version. But Sony (and most other Android phone vendors) prefer you buy a new phone than to release updates for old(er) phones. Hey, we're talking about 3 years here. Since when is that too old?

There's a way around that though. It is possible to install a custom Android ROM and ditch the stock (=default) Android ROM from Sony. The tricky part however is to unlock the bootloader first, to be able to install a so-called recovery ROM (imagine it like a computer BIOS) which then is used to install a custom ROM.

I've already done a couple of howto's for other Android devices (see https://www.claudiokuenzler.com/Android) so this step-by-step guide follows the same procedure as the others.

***Step by step guide how to install a custom Android ROM on Sony Xperia Z2 (Model: D6503)***

1. Warnings/Read before doing anything
Warning: Installing a custom recovery image on the Sony Xperia Z2 will void the warranty.
A custom Android ROM is not officially supported by Sony and there can be bugs.
You will lose all your data on the phone. Make a backup of your pictures, videos, etc.
You're doing this at your own risk, this tutorial is only showing how it is possible. I'm not encouraging anyone to take these steps. You're responsible for this and in case something breaks then you're on your own. If you're scared now then close this page, shut down your computer, leave the house, go on the street, stop the next car and ask the driver if he knows how to install a custom Android ROM on your Sony Xperia Z2.

 

2. Get the unlock code from Sony
One of the main reasons, why I bought the Sony Xperia Z2 in the first place, was Sony's commitment to Android and to allow the user to unlock the bootloader. This can be done on the official Sony website, using this link: https://developer.sonymobile.com/unlockbootloader/unlock-yourboot-loader/ 

Select your device, then enter your e-mail address and accept the terms and conditions:

Sony unlock bootloader Xperia Z2 

 Sony unlock bootloader Xperia Z2

On the next step you need to enter your IMEI.
You can show your device's unique IMEI by opening the dialer and enter: *#*#7378423#*#* . This will open up the service menu (be patient, it took quite a while to show up on my device).
Tap on Service info -> Configuration. This will show you the IMEI and also another important information: Rooting status. This will tell you whether your device supports an unlocked bootloader. In my device "Bootloader unlock allowed" is set to "Yes", so I can continue.

Get IMEI from Sony Xperia Z2 

Enter the IMEI on the Sony website, check the checkboxes, click on Submit.

Sony unlock bootloader Xperia Z2

After this you should get an e-mail with the instructions and code to unlock the bootloader.

 

3. Unlock the phone in fastboot mode
The e-mail from Sony contains a code which needs to be entered in the "fastboot mode". That's kind of a low-level operating system (like BIOS) before the operating system (Android) starts.

To be able to talk to the phone in fastboot mode, you need to have the Android SDK installed. I chose to use my Linux VM for this and installed the necessary Android packages:

mintvm ~ # apt-get install android-tools-adb android-tools-fastboot
Reading package lists... Done
Building dependency tree       
Reading state information... Done
android-tools-adb is already the newest version.
The following NEW packages will be installed:
  android-tools-fastboot
0 upgraded, 1 newly installed, 0 to remove and 15 not upgraded.
Need to get 46.6 kB of archives.
After this operation, 158 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ubuntu.ethz.ch/ubuntu/ trusty/universe android-tools-fastboot amd64 4.2.2+git20130218-3ubuntu23 [46.6 kB]
Fetched 46.6 kB in 1s (31.2 kB/s)           
Selecting previously unselected package android-tools-fastboot.
(Reading database ... 173302 files and directories currently installed.)
Preparing to unpack .../android-tools-fastboot_4.2.2+git20130218-3ubuntu23_amd64.deb ...
Unpacking android-tools-fastboot (4.2.2+git20130218-3ubuntu23) ...
Setting up android-tools-fastboot (4.2.2+git20130218-3ubuntu23) ...

Turn off your Xperia Z2 phone if it is still powered on.
Now connect the USB-cable to your computer. Don't plug the micro-USB end into the phone yet!

The tricky part is to get your phone into fastboot mode, which requires a combo:
On your Xperia Z2, press the Volume up button at the same time as you connect the other end of the USB-cable.

The phone screen went completely black, no way to tell something is running, however the phone is detected on my Linux (once I forwarded the detected Sony Ericsson Mobile S1Boot Fastboot device to my VM):


[  756.364924] usb 1-1: new high-speed USB device number 8 using ehci-pci
[  756.624431] usb 1-1: New USB device found, idVendor=0fce, idProduct=0dde
[  756.624437] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  756.624440] usb 1-1: Product: S1Boot Fastboot
[  756.624442] usb 1-1: Manufacturer: Sony Mobile Communications AB
[  756.624444] usb 1-1: SerialNumber: BH90XXXXXX

The fastboot command is now able to detect the device:

mintvm ~ # fastboot devices
BH906S2Q16    fastboot

With the fastboot command the bootloader can now be unlocked with the code received by Sony:

mintvm ~ # fastboot -i 0x0fce oem unlock 0x25A696721AECXXXX
...
OKAY [  1.493s]
finished. total time: 1.493s

0x25A.... is obviously the code I received in the mail from Sony. It might be a different code for your device (might depend on the IMEI number).

Done. This means the phone's bootloader is unlocked now!

 

4. Install/flash a newer boot image (includes recovery ROM)
The next step is to install a recovery ROM to be able to backup your device and install a custom Android ROM. This is also done with the fastboot command, but we first need to download a recovery ROM which is made for this device. Important to know here: The Xperia Z2's code name is sirius.

Important Note: Although I found the TWRP Recovery ROM for this phone (on https://eu.dl.twrp.me/sirius/), this recovery never worked - I was unable to boot into the recovery mode.
Whatever I tried, I was unable to boot into TWRP recovery mode. I tried all kinds of combos to no result. I even booted the phone into the stock ROM, made the phone setup, enabled USB debugging and connected with adb to the phone to reboot it to recovery (using "adb reboot recovery"), but even this didn't help! Then I came across https://forum.xda-developers.com/showthread.php?t=2769341&page=3 which points to install an advanced kernel (https://forum.xda-developers.com/xperia-z2/development/kernel-advanced-stock-kernel-multirom-t3005894). This kernel includes a newer/different boot image.

So before a recovery ROM can be installed, we first need to install a newer boot image kernel on the phone. I downloaded Z2_ADVstockKernel_v2.zip (from https://www.androidfilehost.com/?w=files&flid=52180 or see mirror here: https://www.claudiokuenzler.com/downloads/SonyXperiaZ2/), unzipped it and there is a boot.img inside. This "boot.img" file can now be installed (flashed) on the phone's boot partition (still in fastboot mode):

mintvm ~ # fastboot flash boot boot.img
sending 'boot' (11312 KB)...
OKAY [  1.424s]
writing 'boot'...
OKAY [  0.770s]
finished. total time: 2.194s

Then launch a reboot of the phone:

mintvm ~ # fastboot reboot
rebooting...

finished. total time: 0.003s

After this I noticed two things:

1) For the first time ever the notification LED turned to pink (was red before) - which was often mentioned on howtos to boot into recovery mode
2) The "normal" boot didn't work anymore. The phone rebooted after the Sony logo.

So when I saw the pink LED, I pressed [Volume Down] and kept pressing on the button until - finally - recovery mode was launched:

Xperia Z2 booted into recovery 

 

5. Download custom Android ROM and Google Apps
Now it's time to download your new custom Android ROM of choice for your Sony Xperia Z2. When I was doing these steps on my phone, I chose "Resurrection Remix OS" with Android 7.1. The relevant zip file for the Xperia Z2 (remember, the phone's code name is sirius) can be downloaded at: https://sourceforge.net/projects/resurrectionremix/files/sirius/ (or in case link is gone, try here: https://www.claudiokuenzler.com/downloads/SonyXperiaZ2/).

You also need to install the Google Apps, without them you won't be able to use Resurrection Remix OS (Android) once it booted.
Download them from http://opengapps.org/ select Platform ARM, Android 7.1, Variant: full.


6. Data wiping and installation of custom ROM in recovery mode
In CWM (which is short for ClockWorkMod's recovery ROM) I enabled the USB Mass Storage device (mounts and storage -> mount USB stoage) and transferred the Resurrection Remix zip and Gapps zip on the phone:

Transfer custom Android ROM into Recovery  

After the transfer I went back to the main menu in CWM and selected "wipe data/factory reset":

Wipe Data Wipe Data

Followed by "wipe cache partition": 

Wipe Cache Wipe Cache

Finally it's time to install the RR and Gapps! Select "install zip" -> choose zip from storage/sdcard1:

Installing Resurrection Remix OS on Sony Xperia Z2 Installing Resurrection Remix OS on Sony Xperia Z2 Installing Resurrection Remix OS on Sony Xperia Z2

Installing Resurrection Remix OS on Sony Xperia Z2 Installing Resurrection Remix OS on Sony Xperia Z2 Installing Resurrection Remix OS on Sony Xperia Z2

Installing Open Gapps on Sony Xperia Z2 Installing Open Gapps on Sony Xperia Z2 Installing Open Gapps on Sony Xperia Z2

Installing Open Gapps on Sony Xperia Z2 Installing Open Gapps on Sony Xperia Z2 Installing Open Gapps on Sony Xperia Z2

Now the big moment: After both zip files were successfully installed, reboot the device...

 

7. Enjoy
And finally, after a very long first boot time of several minutes, the Android setup screen appeared! Hurraaaayy!

Booting custom Android on Sony Xperia Z2 Booting custom Android on Sony Xperia Z2 Booting custom Android on Sony Xperia Z2

 

Cant kill a non-numeric process ID in check_vmware_esx plugin
Monday - Nov 27th 2017 - by - (0 comments)

If you get an UNKNOWN error back from check_vmware_esx.pl plugin, there was something wrong with the previous session to the server and the session file needs to be deleted.

Can't kill a non-numeric process ID at /usr/lib/nagios/plugins/check_vmware_esx.pl line 1747.

Cant kill a non-numeric process ID check_vmware_esx 

In Icinga2 these session files can by default be found in /var/spool/icinga2/tmp:

 # ls -la /var/spool/icinga2/tmp/ | grep session
-rw------- 1 nagios nagios  177 Mar  9  2017 192.168.12.66_session
-rw-r--r-- 1 nagios nagios    6 Mar 17  2017 192.168.12.66_session_locked
-rw------- 1 nagios nagios  177 Feb 28  2017 192.168.12.67_session
-rw-r--r-- 1 nagios nagios    6 Mar 17  2017 192.168.12.67_session_locked
-rw------- 1 nagios nagios  177 Nov 27 07:00 192.168.12.86_session
-rw-r--r-- 1 nagios nagios    6 Nov 27 07:00 192.168.12.86_session_locked
-rw------- 1 nagios nagios  177 Nov 27 07:00 192.168.12.87_session
-rw------- 1 nagios nagios  177 Nov 27 07:00 192.168.12.88_session
-rw------- 1 nagios nagios  177 Oct 14  2016 192.168.12.89_session
-rw-r--r-- 1 nagios nagios    5 Oct 14  2016 192.168.12.89_session_locked
-rw------- 1 nagios nagios  177 Nov 27 07:00 192.168.8.123_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.66_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.67_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.68_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.69_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.70_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.84_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.85_session
-rw-r--r-- 1 nagios nagios    6 Nov 27 07:00 192.168.8.85_session_locked
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.86_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.96_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.8.97_session
-rw------- 1 nagios nagios  176 Nov 27 07:00 192.168.9.66_session

Delete the files for the ESXi hosts which cause problems. This will force the plugin to create a new session file on the next run.

 

Monitoring multiple Varnish instances
Tuesday - Nov 14th 2017 - by - (0 comments)

To monitor Varnish performance I've been using check_varnish.py for quite some time now. It uses the varnishadm command in the background to get all kinds of values; for example number of hits, number of misses, number of requests etc.

Varnish also allows to run in parallel of other Varnish processes, as long as:

- The work directory is different. This is handled by assigning a different work directory name (-n name) to the process. This setting is better known as "instance name", although work directory would be the technical correct naming.

- The listener ports are different. This applies to both the http listener (-a) and the management listener (-T).

The problem with check_varnish.py? It doesn't support the -n parameter to query a certain Varnish process. Or better said: It didn't support the -n parameter. I modified the plugin and created a pull request for the upstream/original plugin.

With the modifications, the plugin check_varnish.py, is now able to monitor multiple Varnish processes/instances. And it stays backward compatible to single Varnish processes launched without -n parameter:

# ./check_varnish.py -f MAIN.cache_miss -n varnish-test
VARNISH OK - MAIN.cache_miss is 683744
| 'MAIN.cache_miss'=683744

 

Wetek Play/Openelec Box update to LibreElec 8.2 features tvheadend wizard
Sunday - Nov 12th 2017 - by - (0 comments)

It's been quite some time since I fiddled around the last time with my Wetek Openelec Box (which is the same device as a Wetek Play 1, just differently branded). I've kept the original OpenElec 6.95.3 on it but I thought it's time to update.

I downloaded LibreELEC-WeTek_Play.arm-8.2.0.tar from https://libreelec.tv/downloads/ and placed the tar file into the Samba share of the Wetek Openelec device, into the "Update" folder and then rebooted the device. This is one way, probably the easiest, to update the device by the way.

Openelec Shares

After the reboot the device powered up, detected a new tar update file and started to run the upgrade. After a while Kodi booted up and I had to do my configurations again. I also needed to set up the channels in TVHeadend again... I remember last time I had to do this was quite a pain (see "TVHeadend Mutex Scan Settings for Cablecom and Thurcom (Switzerland)"). Well first I needed to uninstall the existing TVHeadend application and install the newest one. Due to the switch from OpenElec to LibreElec the repositories changed and this was not correctly updated.

Once I was in the browser configuring TVHeadend (default on port 9981), something new caught my eye: A wizard!

TVHeadend setup wizard  TVheadend setup wizard

TVheadend setup wizard tvheadend setup wizard

tvheadend setup wizard TVheadend setup wizard

tvheadend setup wizard tvheadend setup wizard

Looks like TVHeadend setup got much easier now!

Note: Be careful with the authentication step! As you can see I added a whole local range (192.168.1.0/24) into the "Allowed network" field. It turned out that the Tvheadend HTSP Client (the client application connecting to TVHeadend Server) is connecting via localhost to TVHeadend. So make sure you also add a user for localhost connection - or change "Allowed network" to "0.0.0.0/0". You need to add the user credentials into the Tvheadend HTSP Client settings, too.

Note 2: I also decided to update my second Wetek device, a Wetek Play 2, today. This one is a little bit different as it runs on Android and Kodi on top of it. The upgrade procedure happens through an OTA update using the WeOS app inside Android. After the upgrade to the latest WeOS 3.2 with Kodi 17.4, the "back" button of my remote control didn't work anymore. One of the most important buttons! It turned out that there seems to be a bug in Kodi 17.x causing this problem. A workaround is to disable the "Joystick Support" for the peripheral devices. To do this: Home Screen -> Add-ons -> My add-ons  -> Peripheral libraries -> Select "Joystick Support" and disable the add-on.

 

Confused ElasticSearch refuses to insert data due to mapping conflict
Tuesday - Oct 31st 2017 - by - (0 comments)

The biggest and best reason to run an ELK stack is that you have one big database (oh no, I just wrote it: big data...) for all kinds of logs. All kinds of filters in Kibana lets you find exactly what you need to (once you figured out how) and let's you create nice graphs for statistical or monitoring reasons.

But some negative points or better said potential conflicts may come across your path, too. I'm talking about mapping conflicts. 

Let's assume you have the following log message arriving in Logstash and then sent to ElasticSearch:

"message": "{\"time\":\"2017-10-31T12:13:36.194Z\",\"tags\":[\"foul\",\"penalty\",\"home\"],\"action\":\"referee.decision\",\"data\":{\"team_id\":404,\"player_id\":652020}}\r",

The message runs thorugh a json filter in Logstash in order to split up the fields. By default, Logstash automatically recognizes the "time" field as a "date" format, because of it's ISO8601 format. From https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html: 

"ISO8601 - should parse any valid ISO8601 timestamp, such as 2011-04-19T03:44:01.103Z"

But now another message from another application arrives:

"message": "{\"action\":\"lap.completed\",\"time\":\"01:51:560\",\"data\":{\"car_id\":23,\"pilot_id\":60}}\r",

In this message the "time" field is used for the amount of time to complete the race lap (1 minute, 51 seconds, 560ms). That's definitely not a date. But because the index is in this case the same, ElasticSearch gets confused about the mapping.
This can also be seen in Kibana under Management -> Index Patterns:

Kibana mapping conflict 

In the details of the field "time" one can see that in almost every daily index the "time" field was seen as a "date". But in certain day indexes it was seen as "long":

Kibana field conflict 

ElasticSearch doesn't like such mapping conflicts at all and refuses to insert the received message. Error messages will show up in the ElasticSearch log:

[2017-10-31T13:09:44,836][DEBUG][o.e.a.b.TransportShardBulkAction] [ES02] [docker-2017.10.31][0] failed to execute bulk item (index) BulkShardRequest [[docker-2017.10.31][0]] containing [index {[docker-2017.10.31][docker][AV9yVmKDj3U_Ft3cxfu2], source[{"source_host":"somehost","data":{"player_id":1325124,"team_id":52},"level":6,"created":"2017-10-24T12:06:39.663803227Z","message":"{\"time\":\"2017-10-31T12:09:44.791Z\",\"tags\":[\"foul\",\"penalty\",\"guest\"],\"action\":\"referee.decision\",\"data\":{\"team_id\":52,\"player_id\":1325124}}\r","type":"docker","version":"1.1","tags":["foul","penalty","guest"]"protocol":0,"@timestamp":"2017-10-31T12:09:44.791Z","host":"docker01","@version":"1","action":"referee.decision","time":"2017-10-31T12:09:44.791Z"}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [time]
    at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:298) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:468) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:591) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:396) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:373) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:277) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:530) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:507) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.prepareIndexOperationOnPrimary(TransportShardBulkAction.java:458) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:466) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:146) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:115) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:70) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:975) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:944) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:345) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:270) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:924) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:921) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:151) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1659) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:933) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction.access$500(TransportReplicationAction.java:92) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:291) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:266) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:248) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:644) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.0.jar:5.6.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.NumberFormatException: For input string: "2017-10-31T12:09:44.791Z"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) ~[?:?]
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) ~[?:?]
    at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_144]
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:187) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:737) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$7.parse(NumberFieldMapper.java:709) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:1072) ~[elasticsearch-5.6.0.jar:5.6.0]
    at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:287) ~[elasticsearch-5.6.0.jar:5.6.0]
    ... 36 more

How can this be solved? There are of course several possibilities:

- The proper but almost impossible way: All applications writing into the same index must have a defined structure and common understanding of field names. If you can get all developers thinking the same way you're either working for the perfect company with a perfect documentation system or you're a dictator.

- Resolve the conflict: You could run all the indexes where "time" was seen as a "long" value, again through Logstash. You could add a mutate filter to force a type "date" on the "time" field. This will help for the messages using "time" as a date field, but not for other kinds of messages. You could then run the "long" indexes through Logstash again and kind of re-index the fields. Good luck...

- In my case I chose a different solution, which is by far not perfect. But I figured that all relevant messages in that index which contain a "time" field are indeed used as a date field. As this is also handled by the automatically created "@timestamp" field, I simply decided to drop the field "time" in Logstash:

filter {
    if [type] == "docker" {
    [...]
        mutate { remove_field => [ "time" ] }
    }
}

Right after this and a restart of Logstash, the following log entries appeared in ElasticSearch:

[2017-10-31T13:12:19,386][INFO ][o.e.c.m.MetaDataMappingService] [ES02] [docker-2017.10.31/kpW-7vceQWCQgza3lGK6Dg] update_mapping [docker]
[2017-10-31T13:12:28,584][INFO ][o.e.c.m.MetaDataMappingService] [ES02] [docker-2017.10.31/kpW-7vceQWCQgza3lGK6Dg] update_mapping [docker]
[2017-10-31T13:12:39,458][INFO ][o.e.c.m.MetaDataMappingService] [ES02] [docker-2017.10.31/kpW-7vceQWCQgza3lGK6Dg] update_mapping [docker]
[2017-10-31T13:13:41,338][INFO ][o.e.c.m.MetaDataMappingService] [ES02] [docker-2017.10.31/kpW-7vceQWCQgza3lGK6Dg] update_mapping [docker]

And the log entries were added into the ElasticSearch index again.

PS: Let me know in the comments if this can be handled in a better way.

 

XT Commerce http status 500 without any PHP errors
Tuesday - Oct 31st 2017 - by - (0 comments)

While I was trying to figure out why a XTCommerce shop didn't run anymore after a server migration (returned a HTTP status 500, without any errors), I came across the following important information (source):

Inside the document root is a folder "export. Inside this folder there's a file called "_error_reporting.admin". In order to see PHP errors this file needs to be renamed to "_error_reporting.all" so the application is allowed to display any php errors; otherwise simply nothing will show up:

root@webserver /var/www/shop/export # mv _error_reporting.admin _error_reporting.all

Without renaming this file there won't be any php errors. Not in the browser and neither in log files. 

Right after this (and having display_errors = on) there was finally a PHP error message showing up:

Warning: require_once(/var/www/shop/admin/includes/magnalister/php/lib/MagnaConnector.php): failed to open stream: No such file or directory in /var/www/shop/magnaCallback.php on line 653 Fatal error: require_once(): Failed opening required '/var/www/shop/admin/includes/magnalister/php/lib/MagnaConnector.php' (include_path='.:/opt/plesk/php/5.3/share/pear') in /var/www/shop/magnaCallback.php on line 653

Turned out to be a missing file. /var/www/shop/admin/includes/magnalister/php/lib/MagnaConnector.php didn't exist anymore on this server.

 

Automatically cleaning up archived WAL files on a PostgreSQL server
Friday - Oct 27th 2017 - by - (0 comments)

It's been a couple of weeks since I set up a PostgreSQL replication and added it to our monitoring system (see How to monitor a PostgreSQL replication) and it has been running smoothly so far. But in the past few days a disk usage warning popped up.

Although the databases themselves only use around 10GB of disk space, the WAL files (especially the archived WAL files) eat 63GB!

This is because by default the archived WAL files are kept forever if "archive_mode" is set to on in the PostgreSQL config:

archive_mode = on        # enables archiving; off, on, or always
archive_command = 'cp %p /var/lib/postgresql/9.6/main/archive/%f'

I thought the solution is easy: I just disable the archiv_mode on the master and enable it on the replica (a hot standby). NOPE! I was following the replica as the WAL files were rotating through (I have wal_keep_segments = 32) but no files in the archive directory were created.

A look at an older mail from February 2014 in the PostgreSQL mailing list reveals:

"It works fine, only the server will not generate WAL while it is in recovery.  As soon as you promote the standby, it will archive ist WALs."

A hot_standby replica server is basically ALWAYS running in recovery; means that the "archive_command" will never run on it. Lesson 1 learned: Cleaning up must be done on the master server.
Note: This is only true for hot_standby, it may be different for other kinds of replication modes.

To clean up the archived WAL files, there's a special command pg_archivecleanup. The program can be added into the recovery.conf on a standby server (not hot_standby!) or used as standalone command:

pg_archivecleanup [option...] archivelocation oldestkeptwalfile

I decided to go with the standalone command and build a wrapper around the command. This resulted in a shell script walarchivecleanup.sh. The script allows different options and is able to dynamically looking up a the "oldestkeptwalfile" given by the a max age parameter (-a). A specific "oldestkeptwalfile" can also be given (-f).

Example:

# ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 14 -d
pg_archivecleanup: keep WAL file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B6" and later
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000E6"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B1"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000B0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000056"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000008F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000006F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000BC"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000A2"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000B6"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000A4"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000004F"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000D0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000004E"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000F1"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B5"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000070"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000001C"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000002000000B4"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/000000010000000200000039"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000E0"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/0000000100000001000000FD"
pg_archivecleanup: removing file "/var/lib/postgresql/9.6/main/archive/00000001000000020000003E"
[...]

General information and usage:

$ ./walarchivecleanup.sh
./walarchivecleanup.sh (c) 2017 Claudio Kuenzler
This script helps to clean up archived WAL logs on a PostgreSQL master server using the pg_archivecleanup command.
Please note that WAL archiving currently only works on a master server (as of 9.6).
---------------------
Options:
  -p         Path to the archived WAL logs (e.g. /var/lib/postgresql/9.6/main/archive)
  -a         Age of archived logs to keep (days), anything older will be deleted
  -f         Specify a certain archived WAL file, anything older than this file will be deleted
             Note: If you use -f, it will override -a parameter
  -c         Full path to pg_archivecleanup command (if not found in $PATH)
  -d         Show debug information
  -n         Dry run (simulation only)
---------------------
Usage: ./walarchivecleanup.sh -p archivepath -a age (days) [-d debug] [-f archivefile] [-c path_to_pg_archivecleanup]
Example 1: ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 10
Example 2: ./walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -f 00000001000000010000001E
---------------------
Cronjob example: 00 03 * * * /root/scripts/walarchivecleanup.sh -p /var/lib/postgresql/9.6/main/archive -a 14

The script is now published on Github and can be found here: https://github.com/Napsty/scripts/blob/master/pgsql/walarchivecleanup.sh. Enjoy!

 


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7344 Days
until Death of Computers
Why?