Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Nagios/Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

How to retrieve values from json output in bash
Friday - Apr 29th 2016 - by - (0 comments)

Sure, one could get a json output with curl and then do all kinds of grepping, sedding and awking. But there is actually a cool way, to get the value directly by accessing the json objects. The magic is done by jshon

Fortunately this (cli-)program doesn't need to be compiled manually, in Ubuntu it can be downloaded from the official repos:

sudo apt-get install jshon

Now let's do some json'ing. Let's take the ElasticSearch cluster stats for example. As you may know, you can get the current ES cluster statistics and information from the following URL: http://your.es.node:9200/_cluster/stats (for a human readable output you can use http://your.es.node:9200/_cluster/stats?human&pretty). 

The json output is separated into several objects. You can use jshon to list them:

$ curl "http://elasticsearch-dev.nine.nzz.ch:9200/_cluster/stats" -s | jshon -k
timestamp
cluster_name
status
indices
nodes

As you see above, I directly accessed the ES URL and piped the output to jshon. The -k parameter returns a list of keys. In this case we got 5 keys.

The value I am looking for is within the "indices" key, so I first display the values of the full key with the -e parameter (returns json value):

$ curl "http://elasticsearch-dev.nine.nzz.ch:9200/_cluster/stats" -s | jshon -e indices
{
 "count": 13,
 "shards": {
  "total": 61,
  "primaries": 61,
  "replication": 0.0,
  "index": {
   "shards": {
    "min": 1,
    "max": 5,
    "avg": 4.6923076923076925
   },
   "primaries": {
    "min": 1,
    "max": 5,
    "avg": 4.6923076923076925
   },
   "replication": {
    "min": 0.0,
    "max": 0.0,
    "avg": 0.0
   }
  }
 },
 "docs": {
  "count": 935249,
  "deleted": 196434
 },
 "store": {
  "size": "11.9gb",
  "size_in_bytes": 12860502682,
  "throttle_time": "3.7h",
  "throttle_time_in_millis": 13374212
 },
 "fielddata": {
  "memory_size": "7.9mb",
  "memory_size_in_bytes": 8328212,
  "evictions": 0
 },
 "filter_cache": {
  "memory_size": "78.1mb",
  "memory_size_in_bytes": 81991016,
  "evictions": 72989502
 },
 "id_cache": {
  "memory_size": "0b",
  "memory_size_in_bytes": 0
 },
 "completion": {
  "size": "0b",
  "size_in_bytes": 0
 },
 "segments": {
  "count": 453,
  "memory": "17.9mb",
  "memory_in_bytes": 18873642,
  "index_writer_memory": "0b",
  "index_writer_memory_in_bytes": 0,
  "index_writer_max_memory": "190.1mb",
  "index_writer_max_memory_in_bytes": 199421333,
  "version_map_memory": "0b",
  "version_map_memory_in_bytes": 0,
  "fixed_bit_set": "3.2mb",
  "fixed_bit_set_memory_in_bytes": 3448160
 },
 "percolate": {
  "total": 0,
  "get_time": "0s",
  "time_in_millis": 0,
  "current": 0,
  "memory_size_in_bytes": -1,
  "memory_size": "-1b",
  "queries": 0
 }
}

So this is the full output, but not the single value of what I actually wanted. To do this, the subkeys within the "indices" key can be shown and used the same way:

$ curl "http://elasticsearch-dev.nine.nzz.ch:9200/_cluster/stats" -s | jshon -e indices -k
percolate
shards
count
store
docs
fielddata
filter_cache
id_cache
completion
segments

So now we got a nice list of the subkeys which we can access directly with an additional -e:

$ curl "http://elasticsearch-dev.nine.nzz.ch:9200/_cluster/stats" -s | jshon -e indices -e store
{
 "size": "11.9gb",
 "size_in_bytes": 12860502682,
 "throttle_time": "3.7h",
 "throttle_time_in_millis": 13374212
}

And to get the final single value I wanted from the start, this can be retrieved by using yet another -e parameter:

$ curl "http://elasticsearch-dev.nine.nzz.ch:9200/_cluster/stats" -s | jshon -e indices -e store -e "size_in_bytes"
12860502682

 

Amazon S3 bucket and Internet Explorer 11: HTTP 400 error because of umlaut
Friday - Apr 29th 2016 - by - (0 comments)

There was a strange phenomenon this week at work when trying to access a certain PDF element from an AWS S3 bucket containing a German umlaut () in the filename.

While on Chrome (50) and Firefox (45) it worked correctly (an XML answer showed correctly up, as it was expected in this case), it never worked on Internet Explorer (11).

After using the debug/developer tools (F12 in IE) I saw the following response which wasn't displayed in the browser itself:


InvalidArgumentHeader value cannot be represented using ISO-8859-1.response-content-dispositionattachment; filename=Eat_My_Zn?ni.pdf3E7CFF47D4A6BF23IfEECJc5hXFcmZXshxPKKJg/GCw+1gaWiyc+wXvvlmNTCJ7t3A8YrKARXI0tLnK0QkvAKnuNJBE=

So Internet Explorer 11 seems to have problems to encode the umlaut from ISO-88591- into UTF-8 (although this setting is activated in the Internet Options -> Advanced -> International). 

When writing ue instead of it worked.

Update May 3rd 2016:
My colleague just found out that this only happens in a recent version of IE11. On Internet Explorer 11 Version 11.0.9600.16521 it was working. Another non-working version is 11.0.9600.17278 and the IE version I tested with was 11.0.9600.18282.

 

Did you know Skype understands sed? Replace previous line with sed command!
Wednesday - Apr 27th 2016 - by - (0 comments)

Just found out something funny: Skype understands sed! I was pretty surprised when my sed command worked in Skype and changed a word in my previous written line.

Check this out:

Skype replace with sed

Skype replace with sed

Skype replace with sed

Tested on Skype 4.3.0.37 on Linux Mint 17.3. That just made my day ^_^.

 

Avoid Ansible writing several times the same line in /etc/sudoers
Monday - Apr 25th 2016 - by - (0 comments)

When I ran an Ansible playbook which added an entry in /etc/sudoers for a user "developer" I came across something rather ugly: The line was written every time the playbook ran - resulting in an /etc/sudoers full of the same entries of the developer user.

The relevant task of the playbook looked like this:

  - name: DEVS - Add sudoers entry for developer
    lineinfile: "dest=/etc/sudoers
                regexp='^developer ALL=(www-data)'
                insertafter='^# User privilege specification'
                line='developer ALL=(www-data) NOPASSWD: /bin/bash'
                state=present"

So what I want to achieve with this is the following:

In /etc/sudoers search for a line starting with "developer ALL=(www-data)".
If this line is not found, write the line "developer ALL=(www-data) NOPASSWD: /bin/bash" into /etc/sudoers, right after the line starting with "# User privilege specification".

To my big surprise, whenever I ran the playbook, it executed the task and set it to "changed". Meaning, the entry was added. So at the end /etc/sudoers contained several lines of "developer ALL=(www-data) NOPASSWD: /bin/bash". That's not what I wanted to achieve.

After some troubleshooting it turned out that the regular expression "developer ALL=(www-data)" didn't work because of the brackets ( ). As ansible is using python in the background, I checked out the python regular expression documentation and searched for an escape character. No big surprise here, the escape character is a backslash (\) as it is in Perl's regular expression, too.

I adapted the playbook to escape the brackets:

  - name: DEVS - Add sudoers entry for developer
    lineinfile: "dest=/etc/sudoers
                regexp='^developer ALL=\(www-data\)'
                insertafter='^# User privilege specification'
                line='developer ALL=(www-data) NOPASSWD: /bin/bash'
                state=present"

But now I ran into a YAML syntax error:

ERROR! Syntax Error while loading YAML.

The error appears to have been in 'myplaybook.yaml': line 57, column 36, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

    lineinfile: "dest=/etc/sudoers
                regexp='^developer ALL=\(www-data\)'
                                   ^ here

So the escape character didn't work because YAML didn't like it. Finally I came across the answer: Wrap the character I want to escape in a separate character class and this is how it's done:

  - name: DEVS - Add sudoers entry for developer
    lineinfile: "dest=/etc/sudoers
                regexp='^developer ALL=[(]www-data[)]'
                insertafter='^# User privilege specification'
                line='developer ALL=(www-data) NOPASSWD: /bin/bash'
                state=present"

The playbook now runs through and the sudoers entry only exists once.

 

Disable autostart of lxcbr0 on Ubuntu 16.04 xenial
Friday - Apr 22nd 2016 - by - (0 comments)

One of the major changes on Ubuntu 16.04 (Xenial) is the integration of LXD using LXC 2.0 in the background.
Although I love LXC, I'm kind of annoyed by the fact, that it comes as part of the "base" installation. If I want LXD/LXC, I install it manually. 

Because LXD is now installed by default, the LXC bridge (lxcbr0) is now started automatically:

root@xenial:~# ifconfig
ens160    Link encap:Ethernet  HWaddr 00:50:56:99:37:c4 
          inet addr:10.10.10.10  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fe99:37c4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:918 errors:0 dropped:1 overruns:0 frame:0
          TX packets:441 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:78692 (78.6 KB)  TX bytes:92428 (92.4 KB)

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lxcbr0    Link encap:Ethernet  HWaddr 86:dc:ff:84:80:20 
          inet addr:10.0.3.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::84dc:ffff:fe84:8020/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:570 (570.0 B)

To disable the automatic start of the virtual bridge lxcbr0, adapt /etc/default/lxc-net and set USE_LXC_BRIDGE to false:

root@xenial:~# sed -i "/USE_LXC_BRIDGE/s/true/false/g" /etc/default/lxc-net

This disables the automatic start of the virtual bridge at boot time.

To turn off and delete the bridge during runtime:

root@xenial:~# ifconfig lxcbr0 down
root@xenial:~# brctl delbr lxcbr0

 

check_disk on Ubuntu 16.04 xenial reports DISK CRITICAL: / not found
Friday - Apr 22nd 2016 - by - (0 comments)

Recently I wrote about how an Ubuntu xenial container can be created on an Ubuntu trusty host. As I've started to create new xenial containers like this in the last days and added them into Icinga 2 monitoring, I was pretty surprised when check_disk wasn't working.

Instead of showing the usage of a given partition (with the -p parameter), it reported that the partition could not be found:

root@xenial:~# /usr/lib/nagios/plugins/check_disk -w 10% -c 5% -p /
DISK CRITICAL: / not found

First I suspected a problem from check_disk itself, because Xenial now uses the package monitoring-plugins instead of nagios-plugins:

root@xenial:~# apt-cache search nagios-plugins
nagios-plugins - transitional dummy package (nagios-plugins to monitoring-plugins)
[...]

As I mentioned above, this xenial container runs on a trusty host. I copied the check_disk plugin from within the container onto the trusty host and ran it:

root@trusty:/tmp# ./check_disk -w 10% -c 5% -p /
DISK OK - free space: / 2412 MB (55% inode=57%);| /=1897MB;4107;4335;0;4564

So it's working. Where's the difference? With strace I was able to find out the following:

root@xenial:~# strace /usr/lib/nagios/plugins/check_disk -w 10% -c 5% -p
execve("./check_disk_trusty", ["./check_disk_trusty", "-w", "10%", "-c", "5%", "-p", "/"], [/* 36 vars */]) = 0
brk(NULL)                               = 0x7f4e00293000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dff4df000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=18670, ...}) = 0
mmap(NULL, 18670, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4dff4da000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360`\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=138744, ...}) = 0
mmap(NULL, 2212904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4dfee8f000
mprotect(0x7f4dfeea7000, 2093056, PROT_NONE) = 0
mmap(0x7f4dff0a6000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f4dff0a6000
mmap(0x7f4dff0a8000, 13352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4dff0a8000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1864888, ...}) = 0
mmap(NULL, 3967488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4dfeac6000
mprotect(0x7f4dfec86000, 2093056, PROT_NONE) = 0
mmap(0x7f4dfee85000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f4dfee85000
mmap(0x7f4dfee8b000, 14848, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4dfee8b000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dff4d9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dff4d8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4dff4d7000
arch_prctl(ARCH_SET_FS, 0x7f4dff4d8700) = 0
mprotect(0x7f4dfee85000, 16384, PROT_READ) = 0
mprotect(0x7f4dff0a6000, 4096, PROT_READ) = 0
mprotect(0x7f4dff4e1000, 4096, PROT_READ) = 0
mprotect(0x7f4dff2d1000, 4096, PROT_READ) = 0
munmap(0x7f4dff4da000, 18670)           = 0
set_tid_address(0x7f4dff4d89d0)         = 17605
set_robust_list(0x7f4dff4d89e0, 24)     = 0
rt_sigaction(SIGRTMIN, {0x7f4dfee94b90, [], SA_RESTORER|SA_SIGINFO, 0x7f4dfeea03d0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f4dfee94c20, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f4dfeea03d0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL)                               = 0x7f4e00293000
brk(0x7f4e002b4000)                     = 0x7f4e002b4000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2981280, ...}) = 0
mmap(NULL, 2981280, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4dfe7ee000
close(3)                                = 0
open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 3
futex(0x7f4dfee8c068, FUTEX_WAKE_PRIVATE, 2147483647) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(3, "", 4096)                       = 0
close(3)                                = 0
stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(3, "", 4096)                       = 0
close(3)                                = 0
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2995
read(3, "", 4096)                       = 0
close(3)                                = 0
open("/usr/share/locale/en_US/LC_MESSAGES/nagios-plugins.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/nagios-plugins.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en_US/LC_MESSAGES/nagios-plugins.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en/LC_MESSAGES/nagios-plugins.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 17), ...}) = 0
write(1, "DISK CRITICAL: / not found\n", 27DISK CRITICAL: / not found
) = 27
exit_group(2)                           = ?
+++ exited with 2 +++

The relevant part is this one:

open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 3
futex(0x7f4dfee8c068, FUTEX_WAKE_PRIVATE, 2147483647) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(3, "", 4096)                       = 0
close(3)                                = 0
stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/etc/mtab", O_RDONLY|O_CLOEXEC)   = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(3, "", 4096)                       = 0
close(3)                                = 0

The file /etc/mtab is used by check_disk to get the values. However this file is empty in the xenial container:

root@xenial:~# ls -la /etc/mtab
-rw-r--r-- 1 root root 0 Apr 22 07:09 /etc/mtab

root@xenial:~# cat /etc/mtab
root@xenial:~#

A solution to this if /etc/mtab is symlinked to /proc/mounts which should contain the same information:

root@xenial:~# test -f /etc/mtab && rm /etc/mtab; ln -s /proc/mounts /etc/mtab
rm: remove regular empty file '/etc/mtab'? y

root@xenial:~# cat /etc/mtab
rootfs / rootfs rw 0 0
/dev/vglxc/xenial / ext4 rw,relatime,data=ordered 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /sys/fs/pstore pstore rw,relatime 0 0
devpts /dev/lxc/console devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
devpts /dev/lxc/tty1 devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
devpts /dev/lxc/tty2 devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
devpts /dev/lxc/tty3 devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
devpts /dev/lxc/tty4 devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=666 0 0
devpts /dev/ptmx devpts rw,relatime,gid=5,mode=620,ptmxmode=666 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /run tmpfs rw,nosuid,noexec,relatime,size=611212k,mode=755 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0

And now the check_disk plugin works again:

root@xenial:~# /usr/lib/nagios/plugins/check_disk -w 10% -c 5% -p /
DISK OK - free space: / 1060 MB (57% inode=77%);| /=773MB;1755;1853;0;1951

 

Downtime happens fast. Especially if NIC deletes your domain by accident.
Monday - Apr 18th 2016 - by - (0 comments)

As you can see on my LinkedIn profile, I'm working for one of the leading news corporations in Switzerland. As a news portal, you can imagine how important the domain is. And what if this domain is suddenly deleted? This is exactly what happened last Friday, April 15th 2016. But let's start at the begin. 

At 2.14pm our satellite monitoring running in the AWS cloud (for having an outside view of our web services) reported a failed HTTP check of our main domain. And not just a HTTP 500 or something like this - it was the following error:

Name or service not known HTTP CRITICAL - Unable to open TCP socket

When I got this alert (by both e-mail and SMS) I immediately knew something's off. Name or service not known indicates a state, when the domain name could not be resolved. I first expected a problem in the AWS DNS servers, maybe having a DNS resolving problem. I logged onto the satellite server and verified the DNS resolving issue - and to figure out that other domains resolve without a hiccup. What the hell...?!
A few minutes after this (the time difference was most likely caused by the domain's TTL), we received alerts on our internal systems as well. Now we were in deep trouble.

A whois of our domain did not show any DNS nameservers anymore so I suspected a problem at our domain registrar (Gandi). Maybe someone deleted the DNS servers from the domain? But when I logged into our account, the DNS servers were there. No modification has been done. I called Gandi to ask them for help to figure out what was going on with our domain - but they affirmed me that DNS configuration seemed correct and they can't explain why the domain isn't working.

After Gandi's response, I decided to call SWITCH, the registry operator orr also called NIC (network information center) for domains ending with .ch (Switzerland) and .li (Principality of Liechtenstein). That was at exactly 2.59pm. In a few short sentences I explained our domain problem to the first level support and he asked me to hold on, he'd check with the responsible team (which I know is just a few feet away, I visited their offices back in 2012). A few minutes later he was back and explained me that our domain was blocked - probably because of malware (that were his words). I should contact the security team of SWITCH by e-mail. He couldn't give me any additional information. I sent the mail, explaining the situation in the shortest way possible, asking for an immediate call back to explain what's going on. That was at 3.06pm. I didn't get a call back.

At 3.15pm I called again, reached the same guy from before and demanded to ask directly to the security team or to a supervisor. Which didn't work with the excuse that they don't have a direct phone number. My ass. Our company is completely down (e-mails as well) and I'm being held idle on the phone... At least he went again to see his colleagues from the security team on my request. A few minutes later he was back on the phone and told me that the domain will be reactivated shortly. But still no answer to my question "But why? What happened?!". I was told, the security team would contact me.

At 3.29pm we received first recovery alerts. A whois command showed the DNS nameservers again. But of course this is only a direct whois call on the central servers - DNS cache servers at the big providers have "deleted" our domain. It'll take more than a few minutes to get the domain "back in".

At 4.18pm I got an information from a colleague who has a direct contact with someone from SWITCH and was able to talk to him. It turned out that a human mistake happened and that our domain was accidentally deleted. It took until 4.40pm until we saw normal incoming traffic again.

Besides the downtime which was costly, avoidable and, as you can imagine, hectic, there are a few facts which still anger me:

1) Communication disaster. Until today, nobody ever called or mailed me back and (technically) explained to me what happened.

2) Technically in shape? What kind of official registry operator/network information center just deletes a domain by "error"? What are your monitoring tools? Is there no prevention and verification before "accidentally" deleting a domain? Can anyone working at SWITCH just delete a domain without validation? Let's say you "accidentally" delete a domain like SBB.ch (the Swiss Federal Railways) - oh congratz, you've just brought a huge part of Switzerland's transportation system down.

3) Lies - sweet, sweet lies: SWITCH told my colleague that they "found the problem ourselves at around 3pm". Remember the time when I called and sent an e-mail? Be at least honest and acknowledge the end user had to report you've made a mistake.

Later that day, SWITCH posted a "sorry" on Twitter: "nzz.ch is back online. We're sorry for the erroneous manipulation on our side!".

SWITCH NIC deleted domain

Interestingly, on the very same date this "accident" happened to our domain, the Swiss government released a public document (http://www.bakom.admin.ch/themen/internet/00468/04167/index.html?lang=en) stating:

"Technical management of the .ch domain in relation to the global internet domain name system is being provided by Switch until 2017"

and:

On 15th April 2016, OFCOM launched a public invitation to tender to award the management mandate for .ch domain names. (registry function).

So after 2017 a new private or public organization will take over the registry function currently held by SWITCH. After last Friday I salute this very much.

 

Does an Ubuntu 16.04 (xenial) container run on a 14.04 (trusty) host?
Monday - Apr 18th 2016 - by - (0 comments)

The official release of Ubuntu 16.04 xenial is just 3 days away now and after having run apt-get upgrade, the current state seems to be final:

# cat  /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"

Until recently the DISTRIB_DESCRIPTION was: "Ubuntu Xenial Xerus (development branch)".

Time to test some xenial containers but on a host running 14.04. Will it work?
Short answer: yes. Long answer: no. Detailed answer: yes.

The Ubuntu lxc template in 14.04 contains a parameter --release. By using this, I selected the xenial release:

# lxcname=redis01
# lxc-create -n ${lxcname} -B lvm --vgname=vglxc --fstype=ext4 --fssize=10G -t ubuntu -- --release xenial

This worked surprisingly well and the container was shown and could be started afterwards:

# lxc-ls -f | grep redis
redis01        RUNNING  192.168.253.65             -     YES

However within the container, no services was started (not even ssh server):

root@redis01:~# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       279  0.0  0.0  21224  3828 ?        S    11:21   0:00 /bin/bash
root       340  0.0  0.0  37364  3348 ?        R+   11:21   0:00  \_ ps auxf
root         1  0.0  0.0  36536  1940 ?        S    09:28   0:00 /sbin/init

root@redis01:~# netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name

The reason why SSH didn't start up automatically, seems to be a missing /etc/resolv.conf. The created xenial container seems to use resolvconf (as the host does too) but the symlinks points to nowhere:

root@redis01:/etc# ls -la |grep resolv
drwxr-xr-x  5 root root    4096 Apr 18 09:22 resolvconf
lrwxrwxrwx  1 root root      29 Apr 18 09:10 resolv.conf -> ../run/resolvconf/resolv.conf
root@onl-redis01-t:/etc# cat /run/resolvconf/resolv.conf
cat: /run/resolvconf/resolv.conf: No such file or directory

After I manually copied the same resolv.conf file from the LXC host (which is also generated by resolvconf), I was able to start SSH:

root@redis01:~# service ssh start
 * Starting OpenBSD Secure Shell server sshd                                                                  [ OK ]

root@redis01:~# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      587/sshd       
tcp6       0      0 :::22                   :::*                    LISTEN      587/sshd       

But the much bigger problem is the fact, that xenial uses systemd as new init system/launcher.
As seen in the process output above, PID 1 is /sbin/init. Yet systemd requires to be run as PID 1. If it's not, nothing will work. From https://www.freedesktop.org/software/systemd/man/systemd.html: 

systemd is a system and service manager for Linux operating systems. When run as first process on boot (as PID 1), it acts as init system that brings up and maintains userspace services.

Not even a reboot works in such a case:

root@redis01:~# reboot
Failed to connect to bus: No such file or directory
Failed to talk to init daemon.

In general it's not a good idea to mix the different init systems in such a scenario. As a container is nothing else than an isolated process (but a process at the end) it should run under the same init system.

Ubuntu 14.04 runs with the "upstart init system" so I tried to install it within the container, too:

root@redis01:~# apt-get install upstart
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following additional packages will be installed:
  dbus libcap-ng0 libcgmanager0 libdbus-1-3 libdrm2 libnih-dbus1 libplymouth4 mountall plymouth
  plymouth-theme-ubuntu-text
Suggested packages:
  dbus-user-session | dbus-x11 desktop-base plymouth-themes graphviz bash-completion upstart-monitor
The following NEW packages will be installed:
  dbus libcap-ng0 libcgmanager0 libdbus-1-3 libdrm2 libnih-dbus1 libplymouth4 mountall plymouth
  plymouth-theme-ubuntu-text upstart
0 upgraded, 11 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,039 kB of archives.
After this operation, 4,127 kB of additional disk space will be used.
Do you want to continue? [Y/n] y

Then installed the upstart-sysv package (sysvinit compatible) which also removes systemd-sysv:

root@redis01:~# apt-get install upstart-sysv
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following packages will be REMOVED:
  systemd-sysv
The following NEW packages will be installed:
  upstart-sysv
0 upgraded, 1 newly installed, 1 to remove and 0 not upgraded.
Need to get 39.6 kB of archives.
After this operation, 90.1 kB of additional disk space will be used.
Do you want to continue? [Y/n] y

Finally, I removed and purged systemd out of the OS:

root@redis01:~# apt-get remove --purge --auto-remove systemd
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following packages will be REMOVED:
  systemd*
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 19.2 MB disk space will be freed.
Do you want to continue? [Y/n] y

 Not surprisingly, a reboot didn't work, so I had to turn of the container with lxc-stop and relaunch it manually:

root@redis01:~# reboot
shutdown: Unable to shutdown system

root@lxc-host:~# lxc-stop -n redis01

root@lxc-host:~# lxc-ls -f | grep redis
redis01        STOPPED  -                          -     YES    

root@lxc-host:~# lxc-start -n redis01 -d

root@lxc-host:~# lxc-ls -f | grep redis
redis01        RUNNING  192.168.253.65             -     YES       

Now the big question: Will the xenial container run with the upstart init system?
And yes - the services were started correctly this time!

root@lxc-host:~# lxc-attach -n redis01

root@redis01:~# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN      338/nrpe       
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      336/sshd       
tcp6       0      0 :::5666                 :::*                    LISTEN      338/nrpe       
tcp6       0      0 :::22                   :::*                    LISTEN      336/sshd       


root@redis01:~# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       499  0.0  0.0  21232  3852 ?        S    11:43   0:00 /bin/bash
root       568  0.0  0.0  37364  3352 ?        R+   11:45   0:00  \_ ps auxf
root         1  0.0  0.0  43408  4260 ?        Ss   11:42   0:00 /sbin/init
root        72  0.0  0.0  29884   296 ?        S    11:42   0:00 upstart-socket-bridge --daemon
root        93  0.0  0.0  29948   268 ?        S    11:42   0:00 upstart-udev-bridge --daemon
message+   122  0.0  0.0  42768  2668 ?        Ss   11:42   0:00 dbus-daemon --system --fork
root       125  0.0  0.0  29900   296 ?        S    11:42   0:00 upstart-file-bridge --daemon
root       127  0.0  0.0  41588  3012 ?        Ss   11:42   0:00 /lib/systemd/systemd-udevd --daemon
syslog     228  0.0  0.0 256396  2684 ?        Ssl  11:42   0:00 rsyslogd
root       291  0.0  0.0  12844  1960 lxc/tty4 Ss+  11:42   0:00 /sbin/getty -8 38400 tty4
root       294  0.0  0.0  12844  1848 lxc/tty2 Ss+  11:42   0:00 /sbin/getty -8 38400 tty2
root       295  0.0  0.0  12844  1860 lxc/tty3 Ss+  11:42   0:00 /sbin/getty -8 38400 tty3
root       336  0.0  0.1  65612  6216 ?        Ss   11:42   0:00 /usr/sbin/sshd -D
root       337  0.0  0.0  26068  2440 ?        Ss   11:42   0:00 cron
nagios     338  0.0  0.0  24056  2420 ?        Ss   11:42   0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
root       456  0.0  0.0  65408  4328 ?        Ss   11:42   0:00 /usr/lib/postfix/sbin/master
postfix    460  0.0  0.0  67476  4424 ?        S    11:42   0:00  \_ pickup -l -t unix -u -c
postfix    461  0.0  0.0  67524  4460 ?        S    11:42   0:00  \_ qmgr -l -t unix -u
root       494  0.0  0.0  12844  1844 lxc/console Ss+ 11:42   0:00 /sbin/getty -8 38400 console
root       495  0.0  0.0  12844  1824 lxc/tty1 Ss+  11:42   0:00 /sbin/getty -8 38400 tty1

To sum it all up: If you want to run xenial containers on a trusty host, make sure you use the same init system in the container as the host.

 

Create pure-ftpd global message/banner/greeting on Ubuntu
Friday - Apr 15th 2016 - by - (0 comments)

pure-ftpd on Ubuntu (14.04 in my case) doesn't use a single configuration file for the settings. Instead there is a wrapper script which reads each parameter from separate config files. This is kind of documented here: https://help.ubuntu.com/community/PureFTP. However I couldn't find any hint how to activate a global banner or login message to display when a FTP user logs in. Sure, there's the .banner file in the ftp user's home directory. The official doc says:


    ------------------------ DISPLAYING BANNERS ------------------------

If a '.banner' file is located in the 'ftp' user home directory (or in the
root directory of a virtual server, see below), it will be printed when the
client logs in. Put a nice ASCII-art logo with your name in that file.

This file shouldn't be larger than 4000 bytes, or it won't be displayed.

In each directory, you may also have a '.message' file. Its content will be
printed when a client enters the directory. Such a file can contain important
information ("Don't download version 1.7, it's broken!") .


But I want to do a general banner for everyone and don't want to symlink a .banner every time a new FTP user is created.

There is the possibility however, to load a global banner with the -F flag. It's not called banner or message but Fortune Cookie File:

 A funny random message can be displayed in the initial login banner. The
random cookies are extracted from a text file, in the standard "fortune"
format. If you installed the "fortune" package, you should have a directory
(usually /usr/share/fortune) with binary files (xxxx.dat) and text files
(without the .dat extension) . To use Pure-FTPd cookies, just add the name
of a text file to the '-F' option. For instance:

/usr/local/sbin/pure-ftpd -F /usr/share/fortune/zippy

But how does one tell pure-ftpd to use this flag now? The answer is in the wrapper script (/usr/sbin/pure-ftpd-wrapper):

$ cat /usr/sbin/pure-ftpd-wrapper
[...]
my %conf = ('AllowAnonymousFXP' => ['-W'],
            'AllowDotFiles' => ['-z'],
[...]
            'FortunesFile' => ['-F %s', \&parse_filename],
            'FSCharset' => ['-8 %s', \&parse_string],
[...]
            );


As you can see, the wrapper script is already prepared for a separate config file called "FortunesFile". The correct file name must be used, or it won't work.
So I created this FortunesFile with the path to the file containing th eactual banner message:

# echo "/etc/pure-ftpd/Banner.txt" > /etc/pure-ftpd/conf/FortunesFile

And in the banner file itself, I added the wanted general information:

$ cat /etc/pure-ftpd/Banner.txt
This FTP server is read only. And private, too.

Files older than 30 days will be automatically deleted!

********************************************

A restart of pure-ftpd confirmed that the -F flag was now being used:

# /etc/init.d/pure-ftpd restart
Restarting ftp server: Running: /usr/sbin/pure-ftpd -l puredb:/etc/pure-ftpd/pureftpd.pdb -l pam -A -O clf:/var/log/pure-ftpd/transfer.log -F /etc/pure-ftpd/Banner.txt -p 20000:20500 -u 1000 -8 UTF-8 -d -Y 1 -E -U 117:007 -B

And in an FTP client (here FileZilla), the message is now shown:

pure-ftpd banner

 

New version (20160411) of check_esxi_hardware available
Wednesday - Apr 13th 2016 - by - (0 comments)

This is a double-release of the monitoring plugin check_esxi_hardware.

Version 20151111 contains changes from Stefan Roos where he cleaned up unused variables and properly defined the variable hosturl. The release of this version took longer because the pull request on github needed to be adapted first and because of a problem in the time-universe-continuum (= lack of time).

Version 20160411 contains changes from myself to support minor versions of pywbem.

Until recently there was always only one official version out: 0.7.0.A few weeks ago, pywbem 0.8.0 was officially released, followed by a bugfix and meanwhile the current stable version is 0.8.2.
While I was adapting the code to support such minor version numbers, I received an e-mail from a check_esxi_hardware user which reported the plugin was not working. A debug revealed that his system (CentOS 5) had a package named pywbem 0.7.1 and the python pkg_resources really showed this (never released) version 0.7.1. Although initially adapted for the newer 0.8.x releases, the modification now also work with such (unreal...) 0.7.x subreleases. 

As always, please report any bugs on the github repo.

 


Go to Homepage home
Linux Howtos how to's
Nagios Plugins nagios plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7928 Days
until Death of Computers
Why?