Header RSS Feed
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Ignore systemd log warning Failed to reset devices.list: Operation not permitted in OSSEC
Tuesday - Jul 31st 2018 - by - (0 comments)

Since I migrated a server environment from Debian 7 (Wheezy) to 9 (Strech) I was constantly receiving the following kinds of alert e-mails from OSSEC:

OSSEC HIDS Notification.
2018 Jul 29 09:42:18

Received From: (container3)>/var/log/syslog
Rule: 1002 fired (level 2) -> "Unknown problem somewhere in the system."
Portion of the log(s):

Jul 29 09:42:17 container3 systemd[1]: apt-daily.service: Failed to reset devices.list: Operation not permitted


The following systemd timers caused these log entries:

  • apt-daily.timer
  • phpsessionclean.timer
  • systemd-tmpfiles-clean.timer

Maybe there are even more, depending what is installed.

These logs were found in all LXC containers of the new environment and were caused by this:

"Unprivileged containers cannot modify the devices cgroup configuration."

(found on https://github.com/lxc/lxd/issues/2004)

Yes, that makes sense and is actually expected behaviour. Although SystemD should be able to detect "i am running inside an unprivileged container; I cannot modify my own cgroup settings" and therefore should probably log something different, for now there is no "fix" for this problem.

Anyway, I wanted OSSEC to ignore such log entries. On the OSSEC server I adapted /var/ossec/rules/local_rules and added the following rule:

  <!-- Added rule by Claudio: Ignore systemd warnings "Failed to reset devices.list" -->
  <rule id="100101" level="0">
    <match>Failed to reset devices.list</match>
    <description>Ignore systemd warnings "Failed to reset devices.list" inside containers.</description>

The rule id is a unique ID of your own rule. To make sure you're not using an already used number, you have to use an ID between 100000 and 109999. This range is reserved for "user defined rules".
The if_sid field checks which rule actually created the alert. In the mail alert above you can see which rule was fired: 1002. That's the general rule to grep through syslogs and search for certain regular expressions.
Then in the match field you enter your regular expression. In this case I simply entered a full sentence "Failed to reset devices.list".
And finally in the description field you enter the description of that rule.

After an OSSEC server restart, the alerts were gone.


LXC container in network reachable, but cannot ping between host and container
Friday - Jul 27th 2018 - by - (0 comments)

In the past I've already had some connectivity issues with LXC (see Network connectivity problems when running LXC (with veth) in VMware VM). But today I experienced another kind of problem on a LXC installation on physical servers running Ubuntu 16.04 Xenial.

While network connectivity worked fine from other networks (outside of this LXC host), I was unable to ping between the LXC host and the container.

root@container:~# ping
PING ( 56(84) bytes of data.
From icmp_seq=1 Destination Host Unreachable
From icmp_seq=2 Destination Host Unreachable
From icmp_seq=3 Destination Host Unreachable
From icmp_seq=4 Destination Host Unreachable
From icmp_seq=5 Destination Host Unreachable
From icmp_seq=6 Destination Host Unreachable
--- ping statistics ---
9 packets transmitted, 0 received, +6 errors, 100% packet loss, time 8040ms

root@host:~# ping
PING ( 56(84) bytes of data.
From icmp_seq=1 Destination Host Unreachable
From icmp_seq=2 Destination Host Unreachable
From icmp_seq=3 Destination Host Unreachable
--- ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3999ms

Both host and container are in the same network range and are using the network's central gateway:

root@host:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         UG    0      0        0 virbr0 U     0      0        0 virbr0

root@container:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         UG    0      0        0 eth0 U     0      0        0 eth0

Of course the container is using the hosts virbr0 as network link:

root@host:~# cat /var/lib/lxc/container/config  | grep network
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.ipv4 =
lxc.network.ipv4.gateway =
lxc.network.hwaddr = 54:52:10:66:12:15

Now I remembered that at home I had a small test-server running which has the same specs as in this setup:

  • The LXC host is running directly on physical hardware
  • The host's primary interface is being re-used as virbr0 (minor difference here: at home it's a single eth0, on this setup its a bonding interface bond0)
  • The OS versions do not differ too much (home: Debian 8, this setup: Ubuntu 16.04)
  • The LXC version is the same (2.0.x)
  • The host and the containers run in the same local network range
  • Both the host and the containers use the central gateway (firewall) as default gateway

But there is one huge difference: At home the pings between the host and the container work, on this setup (as mentioned above) this doesn't work.

The first thing I checked were the virtual bridge settings. And by basically just showing the virbr0 I saw a big difference:


root@homehost ~ # brctl show
bridge name    bridge id        STP enabled    interfaces
virbr0        8000.1c1b0d6523df    no        eth0

This setup:

root@host:~# brctl show
bridge name    bridge id        STP enabled    interfaces
lxdbr0        8000.000000000000    no       
virbr0        8000.a0369ff4d626    no        bond0

Even though several containers are running on this host, they don't show up as listed interfaces under this bridge!

I compared the container network config at home and on this setup and found this:


root@homehost ~ # cat /var/lib/lxc/invoicing/config | grep network
# networking
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.ipv4 =
lxc.network.hwaddr = 54:52:00:15:01:73
lxc.network.veth.pair = veth0-container
lxc.network.ipv4.gateway =

This setup (again the same output as above):

root@host:~# cat /var/lib/lxc/container/config  | grep network
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.ipv4 =
lxc.network.ipv4.gateway =
lxc.network.hwaddr = 54:52:10:66:12:15

The network type is macvlan on this setup. This is because I basically copied the network config from another LXC host in this environment. With the difference that this LXC host was virtual (running in VMware) and not physical. Hence the lxc.network.type was set to macvlan because of the connectivity problems mentioned in article Network connectivity problems when running LXC (with veth) in VMware VM).

As soon as I switched the network.type to veth, the container and the host could ping each other, too. And now the container shows up in brctl:

root@host:~# brctl show
bridge name    bridge id        STP enabled    interfaces
lxdbr0        8000.000000000000    no       
virbr0        8000.a0369ff4d626    no        bond0

TL;DR: On LXC hosts running on physical servers/hardware, use veth interfaces. On LXC hosts running themselves as a virtualized host (inside VMware for example), use macvlan interfaces (once again, see Network connectivity problems when running LXC (with veth) in VMware VM).


Fix table increment counter in MariaDB or MySQL after manual row deletion
Tuesday - Jul 24th 2018 - by - (0 comments)

I recently upgraded this tech blog from PHP 5.6 to 7.0 and stumbled (again) across some old mysql* functions. These were removed in PHP 7.0 and needed to be replaced by either PDO or MySQLi (see Changing from PHP's mysql to myqli - what to look at).

While I fixed most of the code, I forgot the admin part of my blog. Before a new article is inserted into the database, the content/text runs through a function to escape special characters: mysqli_real_escape_string(). From the documentation:

"Escapes special characters in a string for use in an SQL statement, taking into account the current charset of the connection"

Old mysql_real_escape_string allowed to simply use a single variable:

# OLD PHP < 7
$iContent = mysql_real_escape_string($iContent);

But (almost all) mysqli functions require the mysqli connection variable (here $connect), too:

# NEW PHP >= 7
$iContent = mysqli_real_escape_string($connect, $iContent);

Long story short: The content was not inserted into the database and once I fixed the code, I had to delete my prior attempts in the table and manually update the article ID to not leave a gap in between articles. This had a negative impact on the table's auto increment counter.

To better show that, I retrieve the latest article ID:

MariaDB [claudiokuenzler]> select newsid from news order by newsid desc limit 0,1;
| newsid |
|    790 |
1 row in set (0.00 sec)

Yet the auto increment counter was already at 793 for the next insert (I manually deleted 2 entries):

|            793 |
1 row in set (0.00 sec)

Of course I wanted to fix this immediately and luckily I came across this stackoverflow question where user Anshul gave a very good and quick explanation:

Further, in order to reset the AUTO_INCREMENT count, you can immediately issue the following statement.
For MySQLs it will reset the value to MAX(id) + 1.

So I did that:

MariaDB [claudiokuenzler]> ALTER TABLE news AUTO_INCREMENT = 1;
Query OK, 788 rows affected (0.01 sec)            
Records: 788  Duplicates: 0  Warnings: 0

And how did this affect the increment counter?

|            791 |
1 row in set (0.00 sec)

Yes! The next insert will have the next ID of 791. Hurray.


Retrieving a value from XML document in Linux Bash
Tuesday - Jul 24th 2018 - by - (0 comments)

A few months ago I wrote about "Automatic SLA reporting from Icinga and push into Confluence page". Since then the script runs on every 1st of the month and automatically updates the relevant pages in our Confluence Wiki. So far so good but sometimes I came across some problems in the calculation of last months availability. On some occasions the json format contained a number too big to handle for json (see step #4 in the article mentioned) and I turned to the CSV output as an alternative.

Yesterday I added the possibility to retrieve the availability stats for a service group (instead of a fixed host and a service of the host). The problem: The CSV output does not contain the average stats of the service group, only the single stats of each service of the group!
The HTML output shows as last row the average numbers of all services:

Icinga 2 ClassicUI Availability Stats Service Group

Now comparing with the full CSV output of the same availablity report:

'dbserver';'SAP DB Processes INSTANCE';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP CCMS INSTANCE: DB Current State';'319978';'50.576%';'50.576%';'312685';'49.424%';'49.424%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP CCMS INSTANCE: Log Space';'319978';'50.576%';'50.576%';'312685';'49.424%';'49.424%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP Dispwork INSTANCE';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP MessageServer INSTANCE';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'TCP Port 3200 (GROUPNAME_DVEB)';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';

You get the problem: How can I get the average stats for the whole service group here? That's the whole point of the grouped stats.

XML to the rescue! The same report in XML format shows the averaged stats of all services (I cut the non-relevant output):

<?xml version="1.0" encoding="utf-8"?>
<servicegroup name="GROUPNAME">
<host name="dbserver">

But how can I get the value for the field "average_percent_time_ok_known"? I already thought of some complicated sed command when I came across the command xml_grep. This command basically does the same as grep, but is specialized on xml documents. With the parameter --text_only you can retrieve the value of the grepped field. xml_grep is part of the xml-twig-tools package, which can easily be installed:

$ sudo apt-get install xml-twig-tools

The full command to retrieve the wanted value from the Icinga availability stats:

$ curl -s -u "${icingauser}:${icingapass}" "http://icinga.example.com/cgi-bin/icinga2-classicui/avail.cgi?show_log_entries=&servicegroup=GROUPNAME&timeperiod=lastmonth&rpttimeperiod=24x7&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringnotrunning=yes&includesoftstates=no&initialassumedservicestate=6&&rpttimeperiod=24x7backtrack=8&content_type=xmloutput&xmloutput" | xml_grep "average_percent_time_ok_known" --text_only

Quick and painless (for the brain).


Installing check_vmware_esx.pl and VMware Perl SDK 6.7 on Ubuntu 16.04 Xenial
Friday - Jul 20th 2018 - by - (0 comments)

As I'm preparing to put a new high-available Icinga 2 cluster into production, I was at the step to migrate the monitoring plugins. While most of the plugins are easily migrated, some depend on third party modules.

Today's headscratcher was the migration of "check_vmware_esx" because it relys on the VMware Perl SDK. I've already had my experiences and troubleshooting cases with that in the past so I knew what I will get into... For reference see:

Because I just spent several hours figuring out which versions work well together (new OS, newer Perl version and modules, new VMware SDK, new ESXi versions) follow the steps below to install the check_vmware_esx plugin on Ubuntu 16.04 (Xenial). Trust me, it will save you a lot of effort.

1. Install pre-requirements
Both the VMware Perl SDK and the check_vmware_esx plugin require some Perl modules. Install the following:

root@xenial:~# apt-get install libssl-dev libxml-libxml-simple-perl libsoap-lite-perl libdata-dump-perl libuuid-perl libdata-uuid-libuuid-perl libuuid-tiny-perl libarchive-zip-perl libcrypt-ssleay-perl libclass-methodmaker-perl libtime-duration-perl

2. Download the SDK
The VMware Perl SDK can be downloaded from here: https://code.vmware.com/web/sdk/67/vsphere-perl. If the link doesn't work anymore, use your favourite search engine to find it. As of this writing the newest available version is 6.7. I downloaded VMware-vSphere-Perl-SDK-6.7.0-8156551.x86_64.tar.gz.

Note: In the past (= a couple of years ago) it was not advised to use VMware Perl SDK above version 5.5. The SDK 6.0 was bloody slow and even the plugin maintainer, Martin Fuerstenau, called it "pretty little thing of bull..t.". This does not apply anymore as 6.7 seems to be working rather fast.

3. Unpack and install the SDK
Note: The SDK needs to be installed as root (or use sudo).

root@xenial:~# tar -xzf VMware-vSphere-Perl-SDK-6.7.0-8156551.x86_64.tar.gz
root@xenial:~# cd vmware-vsphere-cli-distrib/

Now launch the installation of the Perl SDK. Note: This can take quite some time, especially the CPAN installations. Grab a coffee...

root@xenial:~/vmware-vsphere-cli-distrib# ./vmware-install.pl
Creating a new vSphere CLI installer database using the tar4 format.
Installing vSphere CLI 6.7.0 build-8156551 for Linux.
You must read and accept the vSphere CLI End User License Agreement to
Press enter to display it.


Do you accept? (yes/no) yes
Thank you.
warning: vSphere CLI requires Perldoc.
Please install perldoc.
WARNING: The http_proxy environment variable is not set. If your system is
using a proxy for Internet access, you must set the http_proxy environment
variable .
If your system has direct Internet access, you can ignore this warning .
WARNING: The ftp_proxy environment variable is not set.  If your system is
using a proxy for Internet access, you must set the ftp_proxy environment
variable .
If your system has direct Internet access, you can ignore this warning .
Please wait while configuring CPAN ...
Can't locate Module/Build.pm in @INC (you may need to install the Module::Build module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .).
BEGIN failed--compilation aborted.
Below mentioned modules with their version needed to be installed,
these modules are available in your system but vCLI need specific
version to run properly
Module: ExtUtils::MakeMaker, Version: 6.96
Module: Module::Build, Version: 0.4205
Module: Net::FTP, Version: 2.77
Do you want to continue? (yes/no) yes

Please wait while configuring perl modules using CPAN ...
CPAN is downloading and installing pre-requisite Perl module "Path::Class" .
CPAN is downloading and installing pre-requisite Perl module "Socket6 " .
CPAN is downloading and installing pre-requisite Perl module "Text::Template" .
CPAN is downloading and installing pre-requisite Perl module "IO::Socket::INET6" .
CPAN is downloading and installing pre-requisite Perl module "Net::INET6Glue" .
In which directory do you want to install the executable files?
[/usr/bin]  [Enter]
Please wait while copying vSphere CLI files...
The installation of vSphere CLI 6.7.0 build-8156551 for Linux completed
successfully. You can decide to remove this software from your system at any
time by invoking the following command:
This installer has successfully installed both vSphere CLI and the vSphere SDK
for Perl.
The following Perl modules were found on the system but may be too old to work
with vSphere CLI:
Time::Piece 1.31 or newer
Try::Tiny 0.28 or newer
UUID 0.27 or newer
XML::NamespaceSupport 1.12 or newer
XML::LibXML::Common 2.0129 or newer
XML::LibXML 2.0129 or newer
LWP 6.26 or newer
LWP::Protocol::https 6.07 or newer
--the VMware team

4. Test the Perl SDK
You can launch the following command (which came from the SDK installation) to test if the SDK works:

root@xenial:~# /usr/lib/vmware-vcli/apps/general/connect.pl --server myesxhost
Enter username: root
Enter password:
Connection Successful
Server Time : 2018-07-20T13:17:48.861076Z

That looks good! Connection was successful!

Note: With an older SDK (I tested 5.5) an error message "Server version unavailable" appeared and no matter what I tried, I couldn't fix it this time.

5. The monitoring plugin check_vmware_esx.pl
Let's proceed with the monitoring plugin! I suggest you are cloning the whole Github repository because the plugin alone is not enough.

root@xenial:~# cd /tmp
root@xenial:/tmp# git clone https://github.com/BaldMansMojo/check_vmware_esx.git

Then copy the plugin to your other plugins and give it the correct permissions:

root@xenial:/tmp# cp /tmp/check_vmware_esx/check_vmware_esx.pl /usr/lib/nagios/plugins/check_vmware_esx.pl
root@xenial:/tmp# chmod 755 /usr/lib/nagios/plugins/check_vmware_esx.pl

The plugin comes with a few perl modules. They're required to let the plugin run correctly.
To keep them separate from the system perl modules, I created a dedicated folder copied them into it:

root@xenial:/tmp# mkdir -p /opt/check_vmware_esx/modules
root@xenial:/tmp# cp check_vmware_esx/modules/* /opt/check_vmware_esx/modules/

The plugin now needs to know where it can find its own modules. The default is just "modules":

root@xenial:/tmp# grep "use lib" /usr/lib/nagios/plugins/check_vmware_esx.pl
use lib "modules";
#use lib "/usr/lib/nagios/vmware/modules";

So set the correct path (/opt/check_vmware_esx/modules):

root@xenial:/tmp# sed -i "/^use lib/s/modules/\/opt\/check_vmware_esx\/modules/" /usr/lib/nagios/plugins/check_vmware_esx.pl

So it is now:

root@xenial:/tmp# grep "use lib" /usr/lib/nagios/plugins/check_vmware_esx.pl
use lib "/opt/check_vmware_esx/modules";
#use lib "/usr/lib/nagios/vmware/modules";

6. Launch the plugin
Lean back and enjoy.

root@xenial:~# /usr/lib/nagios/plugins/check_vmware_esx.pl -H myesxhost -u root -p secret --select=runtime
OK: 0/21 VMs suspended - 1/21 VMs powered off - 20/21 VMs powered on - overallstatus=green - connection state=connected - All 125 health checks are GREEN: CPU (2x), power (14x), Processors (20x), voltage (32x), system (1x), System (1x), Platform Alert (1x), temperature (28x), other (25x), Memory (1x) - 0 config issues  - 0 config issues ignored

7. Icinga 2's ITL config
In order to immediately work with Icinga 2, the plugin needs either a rename to "/usr/lib/nagios/plugins/check_vmware_esx" or we can create a symlink:

root@xenial:~# ln -s /usr/lib/nagios/plugins/check_vmware_esx.pl /usr/lib/nagios/plugins/check_vmware_esx

The reason for this is that Icinga2's ITL (Icinga Template Library) defines the commands "vmware-esx-*" to use the plugin check_esx_vmware (without the .pl extension):

root@xenial:~# grep check_vmware_esx /usr/share/icinga2/include/ -rni
/usr/share/icinga2/include/plugins-contrib.d/vmware.conf:25:    command = [ PluginContribDir + "/check_vmware_esx" ]

That's it. You're welcome.


Freeing up disk space from CouchDB - do not forget the views!
Thursday - Jul 19th 2018 - by - (0 comments)

In the past few weeks I've seen a steady increase of disk usage of a CouchDB cluster I'm managing:

CouchDB Disk Usage steadily increases 

Time to free up some disk space! I already knew there was a "compaction" mechanism (comparable to the "vacuum" process in PostgreSQL) which will free up the used disk space by removing old revisions of data. But when I ran "compact" on the database using most disk space, it wasn't really helping.

Before I ran compact on the DB, there was a disk size of 10846902016 (Bytes):

# curl -q -s localhost:5984/bigdb
 "db_name": "bigdb",
 "update_seq": "13559532-g1AAAAHTeJzLYWBg4...",
 "sizes": {
  "file": 10846902016,
  "external": 3690681900,
  "active": 5355688382
 "purge_seq": 0,
 "other": {
  "data_size": 3690681900
 "doc_del_count": 46,
 "doc_count": 13559486,
 "disk_size": 10846902016,
 "disk_format_version": 6,
 "data_size": 5355688382,
 "compact_running": true,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 "instance_start_time": "0"

Running compact:

# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact

After this, I was able to see the status of the database compaction processes in the Fauxton UI:

CouchDB Database Compaction Progress

But once the compaction was completed, I found the disk size didn't change:

# curl -q -s localhost:5984/bigdb
 "db_name": "bigdb",
 "update_seq": "13559692-g1AAA...",
 "sizes": {
  "file": 10851612416,
  "external": 3690734442,
  "active": 5355768390
 "purge_seq": 0,
 "other": {
  "data_size": 3690734442
 "doc_del_count": 46,
 "doc_count": 13559646,
 "disk_size": 10851612416,
 "disk_format_version": 6,
 "data_size": 5355768390,
 "compact_running": false,
 "cluster": {
  "q": 8,
  "n": 3,
  "w": 2,
  "r": 2
 "instance_start_time": "0"

Even worse: The compaction process used even more disk space. The opposite of what I expected!

I then checked on the file system level, where most disk space is being used and came across the following folders:

root@st-cdb01-p:/var/lib/couchdb# du -ksh shards/
15G    shards/

root@st-cdb01-p:/var/lib/couchdb# du -ksh .shards/
30G    .shards/

Note the dot in the second folder (.shards). According to the documentation, the ".shards" folder contains "views" and not "databases". So I manually checked the size of a view using the Fauxton UI:

CouchDB View Size 

Woah! Taking a look and comparing "Actual data size (bytes): 1,444,291,661" and "Data size on disk (bytes): 19,648,534,600" I was pretty sure I found the bad guy.

A compaction can also be run on a view (in this case "stats" is the view, can also be seen in the UI screenshot above):

root@couchdb:~# curl -q -s -H "Content-Type: application/json" -X POST localhost:5984/bigdb/_compact/stats

The compaction processes and their current progress can also be checked in the UI:

CouchDB View Compaction Progress

Once all of these processes were completed, 20GB of disk space were freed!

CouchDB Disk Usage after Views Compaction 

The change can also be seen in Fauxton:

CouchDB View Size after compaction

Some additional questions related to compaction and their answers below:

How did the compaction affect the cluster?
I ran the compaction on node 1 of a two node cluster. I could not see an immediate change of disk usage on the second node. I had to run the same compaction commands on node 2 to free disk space there, too.

Shouldn't auto compaction do this job?
That's what I thought, too. I verified that automatic compaction is enabled and this seems to be the case by default (Ubuntu 16.04, CouchDB 2.1):

root@couchdb:~# grep "\[daemons\]" -A 10 /opt/couchdb/etc/default.ini
index_server={couch_index_server, start_link, []}
external_manager={couch_external_manager, start_link, []}
query_servers={couch_proc_manager, start_link, []}
vhosts={couch_httpd_vhost, start_link, []}
httpd={couch_httpd, start_link, []}
uuids={couch_uuids, start, []}
auth_cache={couch_auth_cache, start_link, []}
os_daemons={couch_os_daemons, start_link, []}
compaction_daemon={couch_compaction_daemon, start_link, []}

The compaction_daemon is enabled and so are the settings:

root@couchdb:~# grep "\[compaction_daemon\]" -A 8 /opt/couchdb/etc/default.ini  
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 300
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it's not worth to compact them.
min_file_size = 131072

root@couchdb:~# grep "\[compactions\]" -A 78 /opt/couchdb/etc/default.ini  | egrep -v "^;"
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "50%"}, {from, "00:00"}, {to, "04:00"}, {parallel_view_compaction, true}]

Note: I changed view_fragmentation from the default 60% to 50% and added the "from" and "to" timeslot.

So auto compaction should have been doing its job to free up disk space. According to the logs the compaction daemon did indeed run (on databases and views) but nothing was freed up.

TL;DR of this article?
Do not forget to compact your db views, too! Check their sizes (either in the UI or via CLI) and you should be able to determine where your disk space is getting wasted.

How can I make sure to run compact on all relevant databases and views?
For this purpose I created a script called compact_couchdb.sh. It runs through all the databases found in the addressed CouchDB. In each database, the views are detected. And the script compacts each database and each view of each database found.
The script can be found here (on Github): https://github.com/Napsty/scripts/blob/master/couchdb/compact_couchdb.sh


Monitoring memory usage of a LXC container (comparing 1.x vs 2.x)
Wednesday - Jul 18th 2018 - by - (0 comments)

Monitoring Linux Containers (LXC, also known as System Containers to separate from the Docker world) is as important as you monitor your LXC host. But the usage view inside a container is sometimes "unreal". 

Back in 2013, when I started the monitoring plugin check_lxc, the only way to really check the memory usage of a container was to check the current cgroup values of the container:

root@lxchost:~# /usr/lib/nagios/plugins/check_lxc.sh -n container1 -t mem
LXC container1 OK - Used Memory: 6187 MB|mem=6488358912B;0;0;0;0

In the background of check_lxc.sh, the cgroup values of container1 are read. But why so complicated and not just run a classic check_mem.pl inside the container?

To answer that question, take a look at the following picture:

LXC 1 Memory Usage 

Focus on the memory usage; both the LXC host (top), running LXC 1.x, and the LXC container (bottom) show the exact same values.

Or to see it in text form:

root@lxchost:~# free -m
             total       used       free     shared    buffers     cached
Mem:         32176      30911       1264        119       1855      21921
-/+ buffers/cache:       7135      25041
Swap:         3814        165       3649

root@container1:~# free -m
             total       used       free     shared    buffers     cached
Mem:         32176      30911       1264        119       1855      21921
-/+ buffers/cache:       7135      25041
Swap:         3814        165       3649

The container only sees the same values as the host. But the container itself only uses 6187 MB according to cgroups, not 7135 MB.

That's why you should use check_lxc on the host to get a more accurate memory usage of the containers.

Until recently.

Now that I'm working on a new LXC environment on Debian Stretch, there's a newer LXC version (LXC 2.x). Something immediately caught my eye the first time I ran (h)top:

LXC 2.x container memory usage 

Focus again on the memory usage. This time the LXC host (top) and the LXC container (bottom) have different values. True, the (cpu) load and the swap usage is still the same on both host and container, but it's already something!

Doing the same check with free:

root@lxchost:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          64421       21419         361         179       42639       42173
Swap:         15258         668       14590

root@container1:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          64421        1038       62770         179         612       62770
Swap:         15258         668       14590

Note: These are different hosts and different containers than the values seen above from LXC 1.x.

Both host and container show the total capacity of memory and swap, but the used column clearly shows a difference.
But don't be fooled: The calculation on the available memory (last column) is kind of wrong in the container. That is because the container cannot know about the other containers running beside it and is therefore unaware of other memory consumers.

What about check_lxc in that case?

root@lxchost:~# /usr/lib/nagios/plugins/check_lxc.sh -n container1 -t mem
LXC container1 OK - Used Memory: 1596 MB|mem=1673646080B;0;0;0;0

The host tells us the container is using 1596 MB, which is almost the same value as 1038 (used) + 612 (buff/cache) (=1650 MB).

The big question now: Can check_mem.pl be used inside the container and give accurate alerts?

root@container1:~# ./check_mem.pl -u -w 90 -c 95
OK - 2.6% (1693488 kB) used.|TOTAL=65967908KB;;;; USED=1693488KB;59371117;62669512;; FREE=64274420KB;;;; CACHES=443236KB;;;;

The answer is: No. Because the check_mem.pl plugin (as of today) makes a calculation based on the "free" output from above. And as long as these are kind of incorrect, the container's consumption of resources (disk, memory, cpu) should still be monitored on the host.

If you'd create a script/plugin which only checks the "used" value, you're probably good to go though.

But let's focus on the good news: When you're logged into the container and you run (h)top you now see the (more or less) correct memory consumption of the container. That's already a big improvement and really helpful.


Creating a persistent volume in Rancher 2.0 from a NFS share
Wednesday - Jun 27th 2018 - by - (0 comments)

I'm currently building a new Docker environment. For the last almost 2 years I've successfully been running Docker containers in Rancher 1.x, see some related posts:

Now that Rancher 2.0 recently came out, it's definitely worth to see what can be achieved with it. Something which hit my eye when cross-clicking through the new user interface was the "persistent storage" section. As it turns out, the new Docker environment I'm building needs to have some Docker containers which require an external file system (NFS share) being mounted from a central NFS server. As you can read in my post "The Docker Dilemma: Benefits and risks going into production with Docker" I'm not a fan of mounting local volumes from the Docker host into the container (mainly for security reasons) but mounting a network file system, like a NFS share, is less of a security risk. But let's call it straight by the name: A Docker container or better said the application running inside the container should be built cloud-ready in mind. This means that there shouldn't be any fixed mounts of (internal) file servers (-> Object Storage through HTTP call is the future). But anyway, in the short term this environment still requires that particular NFS share mounted into some of the containers.

Back to the topic: Rancher 2.0 comes with a cluster-wide storage solution. A lot of storage drivers (volume plugins) are ready to be used, including the "NFS Share". And here's how you do it.

1. Add the NFS share as persistent volume on the Kubernetes cluster

Inside the Kubernetes cluster level (here mh-gamma-stage), you can find a top menu entry "Storage" and "Persistent Volumes" inside of Storage.

Click on "Add Volume" and the following form will be shown:

Rancher 2.0 NFS Persistent Storage 

Give the new volume a meaningful name; here I chose nfs-gamma-stage.
Select the correct volume plugin; here "NFS Share" (explains itself).
I defined a volume capacity of 500GB here, but it doesn't actually matter as the NFS server defines the capacity, see later in this article.
Path is the export path from the NFS server (see /etc/exports on the NFS server).
Server of course is the IP or DNS name of the NFS server.
It's also possible to chose whether this volume should be read-only or not.
Hit the "Save" button and you will see the volume being "Available":

Rancher 2.0 NFS Share as persistent volume 

2. Create project and namespace

Inside the Kubernetes cluster level, make sure you create a project and a namespace inside the project - if you haven't already.

3. Claim the persistent volume in the project

The previously created volume can now be claimed inside a project. Enter the project where you want to claim the volume (here: "Gamma" project).

In the tab "Workloads", select the navigation tab "Volumes".

Rancher 2.0 persistent volume from NFS share 

Click on the "Add Volume" button and the "Add Volume Claim" form will show up:

Rancher 2.0 NFS Share as persistent volume

Name: Enter a meaningful name, it can even be the same name as on Kubernetes cluster level.
Namespace: Select the namespace in which the volume should be seen.
Source: Select "Use an existing persistent volume".
Persistent Volume: Select the volume created before.

Click on "Create" and the volume will then show up as "Bound":

NFS Share in Rancher 2.0  

4. Deploy a new workload with the volume attached

So far so good, but now we want to have some containers with this persistent volume! Change to the tab "Workloads" and click on the "Deploy" button to deploy a new workload (and therefore container/s):

NFS Share in Rancher 2.0

I chose a commonly used image "ubuntu:xenial" and scrolled down to the "Volumes" configuration.

Rancher 2.0 NFS Volume attach to Docker container 

Here I selected the persistent volume I created before and also chose two mount points.
In this example the persistent volume (ergo the NFS share) will be mounted twice:
- /mnt will be used as mount point within the container to mount the whole volume. This will be mounted read-only.
- /logs will be used as mount point within the container to mount a subfolder (logs) of the volume. This will be mounted with read-write permissions.

So this is actually pretty useful: The same volume can be used for multiple mount points. It's not necessary to create several volumes and then mount each volume separately into the container. Saves a lot of work!

After this, deploy the workload.

5. Inside the container

Once the workload is deployed (you can see this on the green dots), you can execute a shell into a container and verify that the volumes were mounted:

Rancher 2.0 Volume in Container

So far so good! Several containers were able to write into the volume at the same time (where read-write was given).

But what about the volume sizing? As you could see above, I set a capacity of 500GB in the user interface but the NFS share in the container clearly shows a size of 95GB.
When we increased the NFS share on the NFS server, this was immediately seen inside the container. So this capacity limit in the Rancher UI seems to be more informational than a restriction (not sure though).


Search field in Firefox is gone, how to get it back
Friday - Jun 15th 2018 - by - (0 comments)

In recent Firefox versions I noticed that the search field next to the address bar disappeared. While it is still possible to simply enter keywords in the address bar and therefore use it as search field, it is not possible to dynamically change the search engine, which was pretty handy sometimes.

Search field in Firefox gone 

But the search field can be made visible again. It simply requires a quick change of Firefox's settings.

Open a new tab and enter "about:config" in the address bar, then enter. Accept the warning that you'll be careful.

In the search field (inside the about:config tab), enter: "browser.search.widget.inNavBar".

Firefox change config to show search field again 

As you can see, the value is set to "False". Double-click on the line/text of the preference and it will change to "True" (and text will become bold). And you'll also see that the search field magically re-appeared next to the address bar:

Firefox showing search field again


Ansible: Detect and differ between LXC containers and hosts
Wednesday - Jun 13th 2018 - by - (0 comments)

While looking for a way to handle certain tasks differently inside a LXC container and on a (physical) host, I first tried to use ansible variables based on some hardware. 

But the problem, as you might know, is that LXC containers basically see the same hardware as the host because they use the same kernel (there is no hardware virtualization layer in between).
Note: That's what makes the containers much faster than VM's, just sayin'.

So checking for hardware will not work, as both host and container see the same:

$ ansible host -m setup | grep ansible_system_vendor
        "ansible_system_vendor": "HP",

$ ansible container -m setup | grep ansible_system_vendor
        "ansible_system_vendor": "HP",

When I looked through all available variables coming from "-m setup", I stumbled across ansible_virtualization_role at the end of the output. Looks interesting!

$ ansible host -m setup | grep ansible_virtualization_role
        "ansible_virtualization_role": "host",

$ ansible container -m setup | grep ansible_virtualization_role
        "ansible_virtualization_role": "guest",



Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7098 Days
until Death of Computers