Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

How to use kubectl on a Rancher 2 managed Kubernetes cluster
Friday - Sep 21st 2018 - by - (0 comments)

The major change from Rancher 1.x to 2.x was the exclusive usage of Kubernetes Engine in the background prior to a choice of multiple orchestration engines (Cattle, Kubernetes, Mesos, Swarm). Rancher pushed their own orchestration engine "Cattle" in Rancher 1.x but now there's only Kubernetes left. 

Another big difference between Rancher 1.x to 2.x is (as of now, using Rancher 2.0.8) the fact that it is sometimes not enough to use the Rancher user interface or the API. To use the full capabilities of the Kubernetes cluster, sometimes it is required to directly talk with the underlying Kubernetes engine. This can be seen often when one researches in the Rancher forums.

The easiest way to start up the "kubectl" command, is to select a cluster in the user interface and then simply click on the button "Launch kubectl":

Rancher 2: Launch kubectl 

This opens up a shell window inside the browser. Kubectl is automatically started and connected with the selected cluster:

Rancher 2 kubectl shell in browser

However the shell has some major limitations (e.g. copy/pasting). It's fine and very helpful (indeed) for quick checks and verifications but for deeper analysis it can be a pain. But there's also the possibility to use kubectl from your own machine and connect to the cluster, even when managed by Rancher. And this is what this article is about.

First you need to install kubectl on your machine. To do so follow the official documentation "Install and Set Up kubectl" which explains it straight forward. There are packages ready for almost every OS/distribution.

On my workstation I currently run Linux Mint 18.3, which runs Ubuntu 16.04 (Xenial) underneath:

ckadm@mintp ~ $ cat /etc/*release* /etc/upstream-release/*
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=18.3
DISTRIB_CODENAME=sylvia
DISTRIB_DESCRIPTION="Linux Mint 18.3 Sylvia"
NAME="Linux Mint"
VERSION="18.3 (Sylvia)"
ID=linuxmint
ID_LIKE=ubuntu
PRETTY_NAME="Linux Mint 18.3"
VERSION_ID="18.3"
HOME_URL="http://www.linuxmint.com/"
SUPPORT_URL="http://forums.linuxmint.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/linuxmint/"
VERSION_CODENAME=sylvia
UBUNTU_CODENAME=xenial
cat: /etc/upstream-release: Is a directory
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"

To install kubectl on this Ubuntu 16.04 (Xenial) derivate, the following steps are sufficient:

ckadm@mintp ~ $ sudo apt-get update && sudo apt-get install -y apt-transport-https
ckadm@mintp ~ $ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
ckadm@mintp ~ $ sudo touch /etc/apt/sources.list.d/kubernetes.list
ckadm@mintp ~ $ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
ckadm@mintp ~ $ sudo apt-get update
ckadm@mintp ~ $ sudo apt-get install -y kubectl

 The kubectl command can now be used:

ckadm@mintp ~ $ kubectl version
Client Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.0-rc.1", GitCommit:"3e4aee86dfaf933f03e052859c0a1f52704d4fef", GitTreeState:"clean", BuildDate:"2018-09-18T21:08:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?

So far so good, but how to connect to the cluster?

Remember the button "Launch kubectl" from above? There's a second button next to it: Kubeconfig File. Click on this button and you will see a config in yaml format appearing in the browser:

Rancher kubectl config  

Copy the content starting with "apiVersion" until the end. Note that at the end of the config file the "contexts" are configured.

This is because the Rancher cluster itself serves as a Kubernetes Federation cluster. Basically this means that the Kubernetes cluster running the Rancher application itself is kind of a "parent" cluster. All other clusters are connected to this parent cluster and are talked to using contexts (a bit like SNMPv3 contexts if you know about them). Edit: See edit note at the end of the article.
The advantage is clearly that you have one cluster to manage all the other clusters. But there's a downside: Kubernetes Federation is not yet considered mature. From the official documentation:

"Maturity: The federation project is relatively new and is not very mature. Not all resources are available and many are still alpha. Issue 88 enumerates known issues with the system that the team is busy solving."

The referenced issue 88 itself still has a lot of open tasks and problems.

Back to the topic: Copy the config content from the browser and save it into your user's kubectl config folder (which is located at $HOME/.kube or ~/.kube) as "config" file. You might need to create the folder first.

ckadm@mintp ~ $ mkdir ~/.kube
ckadm@mintp ~ $ vi .kube/config

You can now launch kubectl commands:

ckadm@mintp ~ $ kubectl get all
Unable to connect to the server: x509: certificate signed by unknown authority

Oh! What's this? Actually this error shows up because the certificates, which are used to connect to the cluster created by Rancher, are self-signed. Ergo kubectl wants to play safe and doesn't let you connect. But there's a parameter to disable the certificate validation check:

ckadm@mintp ~ $ kubectl get all --insecure-skip-tls-verify=true
NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.43.0.1            443/TCP   29d

Here we go, that's the same output as from the kubectl command launched in the browser shell.

From now on you're able to quickly connect to your Kubernetes cluster created/managed by Rancher and investigate and get more information, for example details about a pod:

ckadm@mintp ~ $ kubectl get pod importer-84484c757b-gbqcm --namespace gamma --insecure-skip-tls-verify=true
NAME                        READY   STATUS    RESTARTS   AGE
importer-84484c757b-gbqcm   1/1     Running   0          5h

Edit: A few hours after I already published this article, I stumbled across a post in the Rancher forums, which essentially asks for Kubernetes Federation in Rancher 2. It was denied with the same reason I wrote above: It is not mature enough. So this would mean Rancher 2.x does in fact NOT use Federation. Unfortunately it is not written in the documentation how exactly this "parent-child-clustering" is setup in the background.

 

Adapt Roundcube managesieve plugin to dynamically lookup sieve host
Friday - Sep 7th 2018 - by - (0 comments)

I've been using Roundcube webmail since a very early release (0.2.1) since 2009. And still now I think it's the best open source webmail project available.

On a very particular mail server setup using dedicated mailbox servers yet centralized and highly available mail proxies, I came across a problem with Roundcube's "managesieve" plugin.

To explain the setup a bit: Public IMAP/POP3/SMTP listeners are configured on central/HA mail proxies using Postfix transport maps for internal relaying and SASL authentication with a central MySQL database. Nginx is used as IMAP/POP3 reverse proxy. On the same host(s) Roundcube is installed using Nginx+PHP-FPM.

While the IMAP and SMTP connection works of course fine with a "localhost" connection. IMAP connects to localhost - which is the Nginx reverse proxy, which forwards the IMAP-login to the mailbox server (dynamical lookup from the central MySQL database). Same for SMTP: Connect to localhost where Postfix listens and authenticates with SASL.

But Sieve is a different story. It has its own listener (by default tcp/4190) and its own protocol. Something which Nginx is not able to proxy. Hence I got the following error when I tried to access the "Filter" settings in Roundcube:

Roundcube managesieve error 

An error occured. Unable to connect to managesieve server.

Well yes, makes sense because there is no sieve listening on localhost. But the problem is, the managesieve plugin only supports a single entry as sieve host in the config:

 // managesieve server address, default is localhost.
// Replacement variables supported in host name:
// %h - user's IMAP hostname
// %n - http hostname ($_SERVER['SERVER_NAME'])
// %d - domain (http hostname without the first part)
// For example %n = mail.domain.tld, %d = domain.tld
$config['managesieve_host'] = 'localhost';

None of the possible values would help me in this case. Even %h, which looked promising, points at the end to localhost again. So I digged through the source code and found the "connect" function in lib/Roundcube/rcube_sieve_engine.php (see source code in public repo):

    /**
     * Connect to configured managesieve server
     *
     * @param string $username User login
     * @param string $password User password
     *
     * @return int Connection status: 0 on success, >0 on failure
     */
    public function connect($username, $password)
    {
        // Get connection parameters
        $host = $this->rc->config->get('managesieve_host', 'localhost');
        $port = $this->rc->config->get('managesieve_port');
        $tls  = $this->rc->config->get('managesieve_usetls', false);

        $host = rcube_utils::parse_host($host);
        $host = rcube_utils::idn_to_ascii($host);

        // remove tls:// prefix, set TLS flag
        if (($host = preg_replace('|^tls://|i', '', $host, 1, $cnt)) && $cnt) {
            $tls = true;
        }

        if (empty($port)) {
            $port = getservbyname('sieve', 'tcp') ?: self::PORT;
        }

        $plugin = $this->rc->plugins->exec_hook('managesieve_connect', array(
            'user'      => $username,
            'password'  => $password,
            'host'      => $host,
            'port'      => $port,
            'usetls'    => $tls,
            'auth_type' => $this->rc->config->get('managesieve_auth_type'),
            'disabled'  => $this->rc->config->get('managesieve_disabled_extensions'),
            'debug'     => $this->rc->config->get('managesieve_debug', false),
            'auth_cid'  => $this->rc->config->get('managesieve_auth_cid'),
            'auth_pw'   => $this->rc->config->get('managesieve_auth_pw'),
            'socket_options' => $this->rc->config->get('managesieve_conn_options')
        ));
[...]

The relevant part is the $host variable. It will read the value from the config file's "managesieve_host" and fallback to "localhost". To use a dynamical lookup of the managesieve host, I modified the code:

// Get connection parameters
//$host = $this->rc->config->get('managesieve_host', 'localhost'); // this is the default
// Infiniroot added dynamic lookup of managesieve_host:
$domain=substr(strrchr($username, "@"), 1);
$dbh=mysqli_connect("dbhost", "dbuser", "dbpass", "dbname") or die ('I cannot connect to the database because: ' . mysqli_connect_error());
$anfrage=mysqli_query($dbh, "SELECT targetserver FROM transport_maps WHERE domain = '$domain' limit 1");
while ($row = mysqli_fetch_assoc($anfrage)) {
       $resultip = $row[authserver];
}
$host = $resultip;
// End Infiniroot modifications

This will now make a lookup in the central database based on the user login (which is an e-mail address). The domain name is taken from the user's e-mail address, looked up in the transport table (the same table which is also used by Postfix for relaying mails to the target mailbox server) and the resulting IP address is returned as new $host value. From there, the managesieve plugin does what it does and connects.

In Roundcube, the result is a success:

Roundcube Managesieve Dynamic Sieve Host Lookup 

PS: As you can see from the code comments above, the provider is www.infiniroot.com ;-)

 

Install a newer Valgrind version on Ubuntu 14.04 using alternatives
Monday - Sep 3rd 2018 - by - (0 comments)

Although Valgrind is part of the default Ubuntu repositories, the version can sometimes lack behind. In this case a developer required a newer version of Valgrind on an Ubuntu 14.04 server.

The installed version (from the official repos) is 3.10.1:

# dpkg -l|grep valgrind | awk '{print $2" "$3}'
valgrind 1:3.10.1-1ubuntu3~14.5

# valgrind --version
valgrind-3.10.1

 The current release (as of this writing) is 3.13.0. So let's get this new version on board! Luckily this is pretty easy on Debian based systems (like Ubuntu) when using "alternatives".

First download the new release, unpack it, and change into the unpacked folder:

$ wget ftp://sourceware.org/pub/valgrind/valgrind-3.13.0.tar.bz2
$ tar -xjf valgrind-3.13.0.tar.bz2
$ cd valgrind-3.13.0/

Compile the source code:

$ ./configure
$ make

Install the newly compiled files. By default (using ./configure without any parameters) this will install the valgrind binary in /usr/local/bin:

$ sudo make install

At this moment we have two different installations of Valgrind on the system:

# whereis valgrind
valgrind: /usr/bin/valgrind.bin /usr/bin/valgrind /usr/lib/valgrind /usr/bin/X11/valgrind.bin /usr/bin/X11/valgrind /usr/local/bin/valgrind /usr/local/lib/valgrind /usr/include/valgrind /usr/share/man/man1/valgrind.1.gz

As you can see, the first valgrind appearing in the list is /usr/bin/valgrind, somewhat later /usr/local/bin/valgrind is in the list. Now let's tell the system to use an "alternative installation" (hence the "alternatives" word) of Valgrind:

$ sudo update-alternatives --install /usr/bin/valgrind valgrind /usr/local/bin/valgrind 1 --force
update-alternatives: using /usr/local/bin/valgrind to provide /usr/bin/valgrind (valgrind) in auto mode

This command tells Ubuntu to use an alternative for /usr/bin/valgrind - it should now use the binary found in path /usr/local/bin/valgrind.
To expain this on a file level:

$ ll /usr/bin/valgrind
lrwxrwxrwx 1 root root 26 Sep  3 09:34 /usr/bin/valgrind -> /etc/alternatives/valgrind

/usr/bin/valgrind is now a symlink to /etc/alternatives/valgrind

$ ll /etc/alternatives/valgrind
lrwxrwxrwx 1 root root 23 Sep  3 09:34 /etc/alternatives/valgrind -> /usr/local/bin/valgrind

And /etc/alternatives/valgrind is itself another symlink to the final destination /usr/local/bin/valgrind. From now on, the system uses the new Valgrind version:

$ valgrind --version
valgrind-3.13.0

 

Install/Upgrade cmake 3.12.1 on Ubuntu 14.04 using alternatives
Monday - Sep 3rd 2018 - by - (0 comments)

In a previous article, I described how it's possible to Install/Upgrade cmake 3.10.1 in Ubuntu 14.04 using alternatives.

Since then a couple of new versions were released and the same procedure can still be used to install cmake 3.12.1.

Download and compile:

$ wget http://www.cmake.org/files/v3.12/cmake-3.12.1.tar.gz
$ tar -xvzf cmake-3.12.1.tar.gz
$ cd cmake-3.12.1/
$ ./configure
$ make

Make's install command installs cmake by default in /usr/local/bin/cmake, shared files are installed into /usr/local/share/cmake-3.10.

Now it's time to create a backup, in case you need to roll back to the old version:

$ /usr/local/bin/cmake --version
cmake version 3.10.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

$ sudo cp -p /usr/local/bin/cmake{,.3.10.1}

$ ll /usr/local/bin/cmake*
-rwxr-xr-x 1 root root 16509675 Dez 22  2017 /usr/local/bin/cmake
-rwxr-xr-x 1 root root 16509675 Dez 22  2017 /usr/local/bin/cmake.3.10.1

To install (copy) the binary and libraries to the new destination, run:

sudo make install

If you haven't already installed a newer cmake installation, run the following command to tell Ubuntu that the cmake command is now being replaced by an alternative installation:

sudo update-alternatives --install /usr/bin/cmake cmake /usr/local/bin/cmake 1 --force

If you already have a custom cmake version installed (in my case I still had the 3.10.1 version active), the update-alternatives command is not necessary.
The make install command will replace the existing binary in /usr/local/bin/cmake. This can be verified using:

cmake --version
cmake version 3.12.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

 

Change check source in an Icinga 2 distributed master-master setup
Tuesday - Aug 21st 2018 - by - (0 comments)

In my new Icinga 2 architecture I run a distributed setup using a master-master configuration. Both master nodes reside in two different data centers but are connected through internal LAN. Almost all host and service objects are within the "master" zone. And both master nodes (called icinga1 and icinga2) are used as endpoints for this master zone.

root@icinga1:~# cat /etc/icinga2/zones.conf
object Endpoint "icinga1" {
  host = "icinga1"
}

object Endpoint "icinga2" {
  host = "icinga2"
}

object Zone "master" {
    endpoints = [ "icinga1", "icinga2" ]
}

object Zone "global-templates" {
    global = true
}

object Zone "director-global" {
    global = true
}

Icinga automatically distributes checks across both both endpoints, therefore balancing the checks. Sometimes the checks are executed on icinga1, sometimes on icinga2. For most of the checks, this turned out to be ok.
But I came across certain checks where I needed to specifically tell Icinga from where/on which node the check must be executed. In this scenario I needed to ping the interface of the central firewall to determine differences in latency between the two locations.

Icinga 2 master-master setup 

In my previous Icinga setup I used a master-satellite setup to "balance" the checks based on the physical location of the servers to achieve a "different view" of both locations. But in the master-master setup, this is balanced and the graphs contain mixed results over both locations.

So the question is: How can I force a check to be executed on a certain node?

First I tried to create two additional zones called "locationa" and "locationb" and assigned endpoint "icinga1" to "locationa" and endpoint "icinga2" to "locationb" in zones.conf:

object Zone "locationa" {
    endpoints = [ "icinga1" ]
}

object Zone "locationb" {
    endpoints = [ "icinga2" ]
}

And then moved the two service objects into the new zone folders (/etc/icinga2/zones.d/locationa and /etc/icinga2/zones.d/locationb).
But a check config showed that this didn't work and resulted in the following error:

# /etc/init.d/icinga2 checkconfig
 * checking Icinga2 configuration                                                                                     
information/cli: Icinga application loader (version: r2.8.2-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: icinga1
critical/config: Error: Endpoint 'icinga2' is in more than one zone.
Location: in /etc/icinga2/zones.conf: 5:1-5:30
/etc/icinga2/zones.conf(3): }
/etc/icinga2/zones.conf(4):
/etc/icinga2/zones.conf(5): object Endpoint "icinga2" {
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/zones.conf(6):   host = "icinga2"
/etc/icinga2/zones.conf(7): }

critical/config: Error: Endpoint 'icinga1' is in more than one zone.
Location: in /etc/icinga2/zones.conf: 1:0-1:29
/etc/icinga2/zones.conf(1): object Endpoint "icinga1" {
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/zones.conf(2):   host = "icinga1"
/etc/icinga2/zones.conf(3): }

critical/config: 2 errors
 * checking Icinga2 configuration. Check '/var/log/icinga2/startup.log' for details.

So back to step one and I started from scratch: RTFM. And indeed, I came across this: "Pin Checks in a Zone".

In case you want to pin specific checks to their endpoints in a given zone you’ll need to use the command_endpoint attribute. This is reasonable if you want to execute a local disk check in the master Zone on a specific endpoint then.

Wow. That sounds exactly what I need. So I added the "command_endpoint" in the two config files:

# cat /etc/icinga2/zones.d/master/network/FW/firewall-locationa.conf
object Host "firewall-locationa" {
  import "dummy-host"
}

# check ping
object Service "PING FW Interface VLAN X" {
  command_endpoint = "icinga1"
  import "generic-service"
  host_name = "firewall-locationa"
  check_command = "ping"
  vars.ping_address = "192.168.99.1"
}

# cat /etc/icinga2/zones.d/master/network/FW/firewall-locationb.conf
object Host "firewall-locationb" {
  import "dummy-host"
}

# check ping
object Service "PING FW Interface VLAN X" {
  command_endpoint = "icinga2"
  import "generic-service"
  host_name = "firewall-locationb"
  check_command = "ping"
  vars.ping_address = "192.168.99.1"
}

Check config didn't report any errors, so I went ahead.

The check "PING FW Interface VLAN X" on host "firewall-locationb" worked immediately and I could see "check source" was set to "icinga2" in the UI.
But the same check on "firewall-locationa" resulted in an UNKNOWN state and output: Endpoint does not accept commands.

But this is actually quite easy to fix. The "command_endpoint" uses the Icinga 2 API in the background. Because the node icinga2 is actually a slave (although called master-master setup, the second master is setup like a satellite, simply receiving all configs), it is already configured to accept commands in the API feature:

root@icinga2:~# cat /etc/icinga2/features-enabled/api.conf
/**
 * The API listener is used for distributed monitoring setups.
 */
object ApiListener "api" {
  accept_config = true
  accept_commands = true
}

But this line (accept_commands) was missing on node icinga1. Once I added this and restarted Icinga 2, the check for for host "firewall-locationa" was working too.

With these configs I have now the same ping check running to the same destination but from two different sources. Thanks to the graphs of the ping checks I can now see the differences of RTA and Packet Losses.

 

Another way to append a text in sed using ampersand
Wednesday - Aug 15th 2018 - by - (0 comments)

I love such situations when I accidentally stumble across something which turns out to be cool and pretty useful!

I wanted to replace an umlaut (ö) in a text file with the html equivalent (ö):

# cat /tmp/xxx.html
This is a text containing an ö umlaut.
Because in German we use ä ö ü.

For this I wanted to use a simple sed command:

# cat /tmp/xxx.html | sed "s/ö/ö/g"
This is a text containing an öouml; umlaut.
Because in German we use ä öouml; ü.

As you can see above, instead of replacing all ö's, the character was appended by 'ouml;'.

Turns out that the ampersand (&) has a special meaning in sed and is, in this case, being used to "append characters after found element".

Practical example:

# cat /tmp/xxx.html | sed "s/text/& I wrote myself/g"
This is a text I wrote myself containing an ö umlaut.
Because in German we use ä ö ü.

Can be quite handy actually!

To achieve my original goal (replace ö) the special ampersand character needs to be escaped:

# cat /tmp/xxx.html | sed "s/ö/\ö/g"
This is a text containing an ö umlaut.
Because in German we use ä ö ü.


 

Ignore systemd log warning Failed to reset devices.list: Operation not permitted in OSSEC
Tuesday - Jul 31st 2018 - by - (0 comments)

Since I migrated a server environment from Debian 7 (Wheezy) to 9 (Strech) I was constantly receiving the following kinds of alert e-mails from OSSEC:

OSSEC HIDS Notification.
2018 Jul 29 09:42:18

Received From: (container3) 10.10.1.103->/var/log/syslog
Rule: 1002 fired (level 2) -> "Unknown problem somewhere in the system."
Portion of the log(s):

Jul 29 09:42:17 container3 systemd[1]: apt-daily.service: Failed to reset devices.list: Operation not permitted

 --END OF NOTIFICATION

The following systemd timers caused these log entries:

  • apt-daily.timer
  • phpsessionclean.timer
  • systemd-tmpfiles-clean.timer

Maybe there are even more, depending what is installed.

These logs were found in all LXC containers of the new environment and were caused by this:

"Unprivileged containers cannot modify the devices cgroup configuration."

(found on https://github.com/lxc/lxd/issues/2004)

Yes, that makes sense and is actually expected behaviour. Although SystemD should be able to detect "i am running inside an unprivileged container; I cannot modify my own cgroup settings" and therefore should probably log something different, for now there is no "fix" for this problem.

Anyway, I wanted OSSEC to ignore such log entries. On the OSSEC server I adapted /var/ossec/rules/local_rules and added the following rule:

  <!-- Added rule by Claudio: Ignore systemd warnings "Failed to reset devices.list" -->
  <rule id="100101" level="0">
    <if_sid>1002</if_sid>
    <match>Failed to reset devices.list</match>
    <description>Ignore systemd warnings "Failed to reset devices.list" inside containers.</description>
  </rule>

The rule id is a unique ID of your own rule. To make sure you're not using an already used number, you have to use an ID between 100000 and 109999. This range is reserved for "user defined rules".
The if_sid field checks which rule actually created the alert. In the mail alert above you can see which rule was fired: 1002. That's the general rule to grep through syslogs and search for certain regular expressions.
Then in the match field you enter your regular expression. In this case I simply entered a full sentence "Failed to reset devices.list".
And finally in the description field you enter the description of that rule.

After an OSSEC server restart, the alerts were gone.

 

LXC container in network reachable, but cannot ping between host and container
Friday - Jul 27th 2018 - by - (0 comments)

In the past I've already had some connectivity issues with LXC (see Network connectivity problems when running LXC (with veth) in VMware VM). But today I experienced another kind of problem on a LXC installation on physical servers running Ubuntu 16.04 Xenial.

While network connectivity worked fine from other networks (outside of this LXC host), I was unable to ping between the LXC host and the container.

root@container:~# ping 10.166.102.10
PING 10.166.102.10 (10.166.102.10) 56(84) bytes of data.
From 10.166.102.15 icmp_seq=1 Destination Host Unreachable
From 10.166.102.15 icmp_seq=2 Destination Host Unreachable
From 10.166.102.15 icmp_seq=3 Destination Host Unreachable
From 10.166.102.15 icmp_seq=4 Destination Host Unreachable
From 10.166.102.15 icmp_seq=5 Destination Host Unreachable
From 10.166.102.15 icmp_seq=6 Destination Host Unreachable
^C
--- 10.166.102.10 ping statistics ---
9 packets transmitted, 0 received, +6 errors, 100% packet loss, time 8040ms

root@host:~# ping 10.166.102.15
PING 10.166.102.15 (10.166.102.15) 56(84) bytes of data.
From 10.166.102.10 icmp_seq=1 Destination Host Unreachable
From 10.166.102.10 icmp_seq=2 Destination Host Unreachable
From 10.166.102.10 icmp_seq=3 Destination Host Unreachable
^C
--- 10.166.102.15 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3999ms

Both host and container are in the same network range and are using the network's central gateway:

root@host:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.166.102.1    0.0.0.0         UG    0      0        0 virbr0
10.166.102.0    0.0.0.0         255.255.255.192 U     0      0        0 virbr0

root@container:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.166.102.1    0.0.0.0         UG    0      0        0 eth0
10.166.102.0    0.0.0.0         255.255.255.192 U     0      0        0 eth0

Of course the container is using the hosts virbr0 as network link:

root@host:~# cat /var/lib/lxc/container/config  | grep network
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.ipv4 = 10.166.102.15/26
lxc.network.ipv4.gateway = 10.166.102.1
lxc.network.hwaddr = 54:52:10:66:12:15

Now I remembered that at home I had a small test-server running which has the same specs as in this setup:

  • The LXC host is running directly on physical hardware
  • The host's primary interface is being re-used as virbr0 (minor difference here: at home it's a single eth0, on this setup its a bonding interface bond0)
  • The OS versions do not differ too much (home: Debian 8, this setup: Ubuntu 16.04)
  • The LXC version is the same (2.0.x)
  • The host and the containers run in the same local network range
  • Both the host and the containers use the central gateway (firewall) as default gateway

But there is one huge difference: At home the pings between the host and the container work, on this setup (as mentioned above) this doesn't work.

The first thing I checked were the virtual bridge settings. And by basically just showing the virbr0 I saw a big difference:

Home:

root@homehost ~ # brctl show
bridge name    bridge id        STP enabled    interfaces
virbr0        8000.1c1b0d6523df    no        eth0
                            veth0-container
                            veth0-container2
                            veth0-container3
                            veth0-container4

This setup:

root@host:~# brctl show
bridge name    bridge id        STP enabled    interfaces
lxdbr0        8000.000000000000    no       
virbr0        8000.a0369ff4d626    no        bond0

Even though several containers are running on this host, they don't show up as listed interfaces under this bridge!

I compared the container network config at home and on this setup and found this:

Home:

root@homehost ~ # cat /var/lib/lxc/invoicing/config | grep network
# networking
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.ipv4 = 192.168.77.173/24
lxc.network.hwaddr = 54:52:00:15:01:73
lxc.network.veth.pair = veth0-container
lxc.network.ipv4.gateway = 192.168.77.1

This setup (again the same output as above):

root@host:~# cat /var/lib/lxc/container/config  | grep network
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = virbr0
lxc.network.ipv4 = 10.166.102.15/26
lxc.network.ipv4.gateway = 10.166.102.1
lxc.network.hwaddr = 54:52:10:66:12:15

The network type is macvlan on this setup. This is because I basically copied the network config from another LXC host in this environment. With the difference that this LXC host was virtual (running in VMware) and not physical. Hence the lxc.network.type was set to macvlan because of the connectivity problems mentioned in article Network connectivity problems when running LXC (with veth) in VMware VM).

As soon as I switched the network.type to veth, the container and the host could ping each other, too. And now the container shows up in brctl:

root@host:~# brctl show
bridge name    bridge id        STP enabled    interfaces
lxdbr0        8000.000000000000    no       
virbr0        8000.a0369ff4d626    no        bond0
                            veth0F7MCH

TL;DR: On LXC hosts running on physical servers/hardware, use veth interfaces. On LXC hosts running themselves as a virtualized host (inside VMware for example), use macvlan interfaces (once again, see Network connectivity problems when running LXC (with veth) in VMware VM).

 

Fix table increment counter in MariaDB or MySQL after manual row deletion
Tuesday - Jul 24th 2018 - by - (0 comments)

I recently upgraded this tech blog from PHP 5.6 to 7.0 and stumbled (again) across some old mysql* functions. These were removed in PHP 7.0 and needed to be replaced by either PDO or MySQLi (see Changing from PHP's mysql to myqli - what to look at).

While I fixed most of the code, I forgot the admin part of my blog. Before a new article is inserted into the database, the content/text runs through a function to escape special characters: mysqli_real_escape_string(). From the documentation:

"Escapes special characters in a string for use in an SQL statement, taking into account the current charset of the connection"

Old mysql_real_escape_string allowed to simply use a single variable:

# OLD PHP < 7
$iContent = mysql_real_escape_string($iContent);

But (almost all) mysqli functions require the mysqli connection variable (here $connect), too:

# NEW PHP >= 7
$iContent = mysqli_real_escape_string($connect, $iContent);

Long story short: The content was not inserted into the database and once I fixed the code, I had to delete my prior attempts in the table and manually update the article ID to not leave a gap in between articles. This had a negative impact on the table's auto increment counter.

To better show that, I retrieve the latest article ID:

MariaDB [claudiokuenzler]> select newsid from news order by newsid desc limit 0,1;
+--------+
| newsid |
+--------+
|    790 |
+--------+
1 row in set (0.00 sec)

Yet the auto increment counter was already at 793 for the next insert (I manually deleted 2 entries):

MariaDB [(claudiokuenzler)]> SELECT `AUTO_INCREMENT` FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'claudiokuenzler' AND TABLE_NAME = 'news';
+----------------+
| AUTO_INCREMENT |
+----------------+
|            793 |
+----------------+
1 row in set (0.00 sec)

Of course I wanted to fix this immediately and luckily I came across this stackoverflow question where user Anshul gave a very good and quick explanation:

Further, in order to reset the AUTO_INCREMENT count, you can immediately issue the following statement.
ALTER TABLE `users` AUTO_INCREMENT = 1;
For MySQLs it will reset the value to MAX(id) + 1.

So I did that:

MariaDB [claudiokuenzler]> ALTER TABLE news AUTO_INCREMENT = 1;
Query OK, 788 rows affected (0.01 sec)            
Records: 788  Duplicates: 0  Warnings: 0

And how did this affect the increment counter?

MariaDB [claudiokuenzler]> SELECT `AUTO_INCREMENT` FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'claudiokuenzler' AND TABLE_NAME = 'news';
+----------------+
| AUTO_INCREMENT |
+----------------+
|            791 |
+----------------+
1 row in set (0.00 sec)

Yes! The next insert will have the next ID of 791. Hurray.

 

Retrieving a value from XML document in Linux Bash
Tuesday - Jul 24th 2018 - by - (0 comments)

A few months ago I wrote about "Automatic SLA reporting from Icinga and push into Confluence page". Since then the script runs on every 1st of the month and automatically updates the relevant pages in our Confluence Wiki. So far so good but sometimes I came across some problems in the calculation of last months availability. On some occasions the json format contained a number too big to handle for json (see step #4 in the article mentioned) and I turned to the CSV output as an alternative.

Yesterday I added the possibility to retrieve the availability stats for a service group (instead of a fixed host and a service of the host). The problem: The CSV output does not contain the average stats of the service group, only the single stats of each service of the group!
The HTML output shows as last row the average numbers of all services:

Icinga 2 ClassicUI Availability Stats Service Group

Now comparing with the full CSV output of the same availablity report:

'SERVICEGROUP GROUPNAME HOST_STATE_BREAKDOWNS';
'host_name';'time_up_scheduled';'percent_time_up_scheduled';'percent_known_time_up_scheduled';'time_up_unscheduled';'percent_time_up_unscheduled';'percent_known_time_up_unscheduled';'total_time_up';'percent_total_time_up';'percent_known_time_up';'time_down_scheduled';'percent_time_down_scheduled';'percent_known_time_down_scheduled';'time_down_unscheduled';'percent_time_down_unscheduled';'percent_known_time_down_unscheduled';'total_time_down';'percent_total_time_down';'percent_known_time_down';'time_unreachable_scheduled';'percent_time_unreachable_scheduled';'percent_known_time_unreachable_scheduled';'time_unreachable_unscheduled';'percent_time_unreachable_unscheduled';'percent_known_time_unreachable_unscheduled';'total_time_unreachable';'percent_total_time_unreachable';'percent_known_time_unreachable';'time_undetermined_not_running';'percent_time_undetermined_not_running';'time_undetermined_no_data';'percent_time_undetermined_no_data';'total_time_undetermined';'percent_total_time_undetermined';
'dbserver';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'SERVICEGROUP GROUPNAME SERVICE_STATE_BREAKDOWNS';
'host_name';'service_description';'time_ok_scheduled';'percent_time_ok_scheduled';'percent_known_time_ok_scheduled';'time_ok_unscheduled';'percent_time_ok_unscheduled';'percent_known_time_ok_unscheduled';'total_time_ok';'percent_total_time_ok';'percent_known_time_ok';'time_warning_scheduled';'percent_time_warning_scheduled';'percent_known_time_warning_scheduled';'time_warning_unscheduled';'percent_time_warning_unscheduled';'percent_known_time_warning_unscheduled';'total_time_warning';'percent_total_time_warning';'percent_known_time_warning';'time_unknown_scheduled';'percent_time_unknown_scheduled';'percent_known_time_unknown_scheduled';'time_unknown_unscheduled';'percent_time_unknown_unscheduled';'percent_known_time_unknown_unscheduled';'total_time_unknown';'percent_total_time_unknown';'percent_known_time_unknown';'time_critical_scheduled';'percent_time_critical_scheduled';'percent_known_time_critical_scheduled';'time_critical_unscheduled';'percent_time_critical_unscheduled';'percent_known_time_critical_unscheduled';'total_time_critical';'percent_total_time_critical';'percent_known_time_critical';'time_undetermined_not_running';'percent_time_undetermined_not_running';'time_undetermined_no_data';'percent_time_undetermined_no_data';'total_time_undetermined';'percent_total_time_undetermined';
'dbserver';'SAP DB Processes INSTANCE';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP CCMS INSTANCE: DB Current State';'319978';'50.576%';'50.576%';'312685';'49.424%';'49.424%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP CCMS INSTANCE: Log Space';'319978';'50.576%';'50.576%';'312685';'49.424%';'49.424%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP Dispwork INSTANCE';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'SAP MessageServer INSTANCE';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';
'appserver';'TCP Port 3200 (GROUPNAME_DVEB)';'0';'0.000%';'0.000%';'632663';'100.000%';'100.000%';'632663';'100.000%';'100.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0.000%';'0';'0.000%';'0';'0.000%';'0';'0.000%';

You get the problem: How can I get the average stats for the whole service group here? That's the whole point of the grouped stats.

XML to the rescue! The same report in XML format shows the averaged stats of all services (I cut the non-relevant output):

<?xml version="1.0" encoding="utf-8"?>
<servicegroup_availability>
<servicegroup name="GROUPNAME">
<hosts>
<host name="dbserver">
[...]
<all_services_average>
<average_percent_time_ok>100.000</average_percent_time_ok>
<average_percent_time_ok_known>100.000</average_percent_time_ok_known>
<average_percent_time_warning>0.000</average_percent_time_warning>
<average_percent_time_warning_known>0.000</average_percent_time_warning_known>
<average_percent_time_unknown>0.000</average_percent_time_unknown>
<average_percent_time_unknown_known>0.000</average_percent_time_unknown_known>
<average_percent_time_critical>0.000</average_percent_time_critical>
<average_percent_time_critical_known>0.000</average_percent_time_critical_known>
<average_percent_time_indeterminate>0.000</average_percent_time_indeterminate>
</all_services_average>
</services>
</servicegroup>
</servicegroup_availability>

But how can I get the value for the field "average_percent_time_ok_known"? I already thought of some complicated sed command when I came across the command xml_grep. This command basically does the same as grep, but is specialized on xml documents. With the parameter --text_only you can retrieve the value of the grepped field. xml_grep is part of the xml-twig-tools package, which can easily be installed:

$ sudo apt-get install xml-twig-tools

The full command to retrieve the wanted value from the Icinga availability stats:

$ curl -s -u "${icingauser}:${icingapass}" "http://icinga.example.com/cgi-bin/icinga2-classicui/avail.cgi?show_log_entries=&servicegroup=GROUPNAME&timeperiod=lastmonth&rpttimeperiod=24x7&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringnotrunning=yes&includesoftstates=no&initialassumedservicestate=6&&rpttimeperiod=24x7backtrack=8&content_type=xmloutput&xmloutput" | xml_grep "average_percent_time_ok_known" --text_only
100.000

Quick and painless (for the brain).

 


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7059 Days
until Death of Computers
Why?