Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Nagios/Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Use bash to compare remote cpu load and print lowest value of array
Thursday - Aug 14th 2014 - by - (0 comments)

In some cases it might be useful to compare remote load values of different servers and use these values to determine the server with the lowest load. Practical examples would be a provisioning server or a load balancing server.

The current load averages (1min, 5min, 15min) can be displayed by using /proc/loadavg:

cat /proc/loadavg
0.18 0.24 0.20 1/563 28186

For balancing or provisioning purposes, the value to take a look at is the third value which is the load average during the last 15 minutes.

cat /proc/loadavg | awk '{print $3}'
0.20

This of course also possible by using a remote SSH command (don't forget to escape the Dollar sign):

ssh root@remoteserver "cat /proc/loadavg | awk '{print \$3}'"
0.05

To get the current load average on a bunch of server and to show the server with the lowest cpu load (in the last 15 minutes), the following script can be launched:

for server in server01 server02 server03 server04; do
  case $server in
    server01) load[1]=$(ssh root@$server "cat /proc/loadavg | awk '{print \$3}'");;
    server02) load[2]=$(ssh root@$server "cat /proc/loadavg | awk '{print \$3}'");;
    server03) load[3]=$(ssh root@$server "cat /proc/loadavg | awk '{print \$3}'");;
    server04) load[4]=$(ssh root@$server "cat /proc/loadavg | awk '{print \$3}'");;
  esac
done


echo "${load[*]}" | tr ' ' '\n' | awk 'NR==1{min=$0}NR>1 && $1<min{min=$1;pos=NR}END{print "Server #:"pos,"Load: "min}'
Server #:3 Load: 0.07

This can of course easily be verified

echo ${load[@]}
0.22 0.36 0.07 0.20 0.30

As a short explanation what this scriopt is doing:

For each server, a remote ssh command is executed to get the current 15min load average value. This value is saved into an array "load" and the array index number "1" (because I start with server #1, the array index should have the same number as the servername). After the for loop, the full array "load" is returned. Each array value is compared with the previous array value. If the current array value is smaller than the previous one, then the variable "min" is set with the value value of the new lowest value. Besides that, the variable "pos" is set, which defines the position of the current value (NR).
At the end, the information is printed to stdout with additional information ("Server #" and "Load:") as strings.

This of course also works without the case loop (see below) but the case loop may be helpful if additional information wants to be gathered at the same time.

i=1; for server in server01 server02 server03 server04 server05; do
myload[$i]=$(ssh root@$server "cat /proc/loadavg | awk '{print \$3}'")
let i++
done


echo "${myload[*]}" | tr ' ' '\n' | awk 'NR==1{min=$0}NR>1 && $1<min{min=$1;pos=NR}END{print "Server #:"pos,"Load: "min}'
Server #:3 Load: 0.06

Source for this very neat awk comparison: http://stackoverflow.com/questions/16610162/bash-return-position-of-the-smallest-entry-in-an-array

 

GlusterFS bricks should be in a subfolder of a mountpoint
Tuesday - Aug 5th 2014 - by - (0 comments)

When I did my first GlusterFS setup (not that long ago) in February 2014, I documented the following steps:

Create new LVM LV (which will be the brick):

lvcreate -n brick1 -L 10G vgdata

Format the LV (I used ext3 back then):

mkfs.ext3 /dev/mapper/vgdata-brick1

Create local mountpoint for the brick LV:

mkdir /srv/glustermnt

Mount brick LV to the local mointpoint (and create fstab entry):

mount /dev/mapper/vgdata-brick1 /srv/glustermnt

Create Gluster volume:

gluster volume create myglustervol replica 2 transport tcp node1:/srv/glustermnt node2:/srv/glustermnt
volume create: myglustervol: success: please start the volume to access data

This was on a Debian Wheezy with glusterfs-server 3.4.1.

This seems to have changed now on a Ubuntu 14.04 LTS with glusterfs-server 3.4.2, when I tried to create a volume over three nodes:

gluster volume create myglustervol replica 3 transport tcp node1:/srv/glustermnt node2:/srv/glustermnt node3:/srv/glustermnt
volume create: backup: failed: The brick node1:/srv/glustermnt is a mount point. Please create a sub-directory under the mount point and use that as the brick directory. Or use 'force' at the end of the command if you want to override this behavior.

I came across a mailing list discussion (see this page for the archive) where this same error message was mentioned by the OP. The answer was, to my surprise, that it should have never been a direct mount point in the first place - although it worked:

The brick directory should ideally be a sub-directory of a mount point (and not a mount point directory itself) for ease of administration.
We recently added code to warn about this

So I now created a subfolder within the mount point (on all the other peers, too) and relaunched the volume create command with the adapted path:

gluster volume create myglustervol replica 3 transport tcp node1:/srv/glustermnt/brick node2:/srv/glustermnt/brick node3:/srv/glustermnt/brick
volume create: myglustervol: success: please start the volume to access data

 Looks better. But I'm still wondering why it was working in February 2014 when the mailing list entry was from May 2013...

 

New version of check_equallogic features snmp connection check
Friday - Jul 25th 2014 - by - (0 comments)

The newest version of the Nagios/Icinga plugin check_equallogic with version number 20140711 contains a snmp connection check. This was requested a lot over the last months and since I published the plugin on github (see https://github.com/Napsty/check_equallogic), there were even some issues and pull requests opened for that (thanks guys). 

But instead of  just creating a new check type (like -t snmp), I wanted that all checks are automatically using the snmp connection check. Otherwise every Nagios/Icinga admin would have to define service dependencies which would complicate configurations. Lame.

So the snmp connectivity check is defined as a function at the begin of the plugin which makes an snmp query and gets all the member names of the Equallogic group. This function is then used in all the checks so in case the snmp connection fails for reason XYZ, the checks all return the connection failure. Before that, some of the check types still returned "OK" even though the values from an Equallogic member couldnt be read.

The plan is also to use the information queried by the snmp connectivity check as global information for future checks (e.g. to check the values of only one member).

So again to summarize: The new snmp connectivity check is built-in and you don't need to change your configurations to enable it. Simply replace the plugin with the new version and you're good to go. 

Enjoy.

And I'll enjoy my birthday now. 

 

One year of collected data - a review of temperatures in Zurich
Friday - Jul 25th 2014 - by - (0 comments)

Back in August 2013 I wrote about a very hot previous month in Switzerland (July 2013 in Switzerland - a hot month (graph)). The data was collected from a temperature sensor integrated in Icinga (Nagios) monitoring and automatically graphed with an interval of one minute.

Now there's one year of data collected and the hot month of July 2013 was a lot hotter than this year. It's interesting to see that Switzerland can have peaks of more than 30 degrees Celsius (24 hour average!) but also around 0 degrees, even though the winter 2013/2014 was pretty mild.

Here's the graphic for temperatures in Zurich, Switzerland from end of June 2013 - July 25th 2014:

Temperature Zurich Switzerland 2013-2014

 

Cannot connect to SSH: Read from socket failed: Connection reset by peer
Wednesday - Jul 23rd 2014 - by - (0 comments)

Cloned an LXC container from an existing one and then tried to connect to the new LXC through SSH and got this error:

ssh lxc24
Read from socket failed: Connection reset by peer

Logged in through lxc-console, the following error found in /var/log/auth.log describes the source of the problem pretty clear:

lxc24 login[1619]: pam_unix(login:session): session closed for user root
lxc24 sshd[1913]: error: Could not load host key: /etc/ssh/ssh_host_rsa_key
lxc24 sshd[1913]: error: Could not load host key: /etc/ssh/ssh_host_dsa_key
lxc24 sshd[1913]: error: Could not load host key: /etc/ssh/ssh_host_ecdsa_key

 Somehow during the clone process the host keys were removed. I simply recreated them using:

ssh-keygen -b 1024 -t rsa -f /etc/ssh/ssh_host_key
ssh-keygen -b 1024 -t rsa -f /etc/ssh/ssh_host_rsa_key
ssh-keygen -b 1024 -t dsa -f /etc/ssh/ssh_host_dsa_key

or as a quicker alternative, run dkpg-reconfigure openssh-server (thanks Fabien):

dpkg-reconfigure openssh-server

And the SSH login worked again (magic! lol):

ssh lxc24
Linux lxc24 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u1 x86_64


 

Automate Postfix installation in Debian and Ubuntu with debconf
Monday - Jul 21st 2014 - by - (0 comments)

Usually a Postfix installation under a Debian or Ubuntu Linux is followed by an interactive question like this:

apt-get install postfix

 Postfix Installation

Nowadays in the age of LXC, this can be annoying, if the LXC template contains the installation of the postfix package.

But this can be automated through the debconf command. I added the following lines into the "configure_debian" section in Debian Wheezy's /usr/share/lxc/templates/lxc-debian template and into the "configure_ubuntu" section in Ubuntu 14.04's /usr/share/lxc/templates/lxc-ubuntu template:

echo "postfix postfix/main_mailer_type select smarthost" | chroot $rootfs debconf-set-selections
echo "postfix postfix/mailname string $hostname.localdomain" | chroot $rootfs debconf-set-selections
echo "postfix postfix/relayhost string smtp.localdomain" | chroot $rootfs debconf-set-selections

This "pre-answers" the questions coming up during the Postfix installation and the postfix installation runs through without asking anything:

apt-get install postfix
[...]
Setting up postfix (2.11.0-1) ...
Creating /etc/postfix/dynamicmaps.cf
Adding tcp map entry to /etc/postfix/dynamicmaps.cf
Adding sqlite map entry to /etc/postfix/dynamicmaps.cf
setting myhostname: myhostname
setting alias maps
setting alias database
changing /etc/mailname to myhostname.localdomain
setting myorigin
setting destinations: localhost.localdomain, localhost
setting relayhost: smtp.localdomain
setting mynetworks: 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
setting mailbox_size_limit: 0
setting recipient_delimiter: +
setting inet_interfaces: all

By the way, the settings set by "debconf-set-selections" can be verified or manually edited in /var/cache/debconf/config.dat:

cat /var/cache/debconf/config.dat | grep -B 4 seen
[...]
Name: postfix/mailname
Template: postfix/mailname
Value: myhostname.localdomain
Owners: postfix
Flags: seen
--
Name: postfix/main_mailer_type
Template: postfix/main_mailer_type
Value: smarthost
Owners: postfix
Flags: seen
--
Name: postfix/relayhost
Template: postfix/relayhost
Value: smtp.localdomain
Owners: postfix
Flags: seen


 

MySQL Galera cluster not starting (failed to open channel)
Monday - Jul 14th 2014 - by - (0 comments)

On a Galera Cluster test environment which was previously shut down (two virtual servers on the same physical machine), I got the following error message when I tried to start MySQL on the first cluster node:

/etc/init.d/mysql start
 * Starting MariaDB database server mysqld     [fail]

The detailed information was logged in the background in syslog:

Jul 14 15:17:07 node1 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jul 14 15:17:07 node1 mysqld_safe: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.iuhkNF' --pid-file='/var/lib/mysql/node1-recover.pid'
Jul 14 15:17:09 node1 mysqld_safe: WSREP: Recovered position cc4fb7ad-e5ab-11e3-8fae-d3fd14daa6a4:391488
Jul 14 15:17:09 node1 mysqld: 140714 15:17:09 [Note] WSREP: wsrep_start_position var submitted: 'cc4fb7ad-e5ab-11e3-8fae-d3fd14daa6a4:391488'
[...]
Jul 14 15:17:13 node1 mysqld: 140714 15:17:13 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50224S), skipping check
Jul 14 15:17:38 node1 /etc/init.d/mysql[11978]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Jul 14 15:17:38 node1 /etc/init.d/mysql[11978]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
Jul 14 15:17:38 node1 /etc/init.d/mysql[11978]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")'
Jul 14 15:17:38 node1 /etc/init.d/mysql[11978]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Jul 14 15:17:38 node1 /etc/init.d/mysql[11978]:
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [Note] WSREP: view((empty))
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
Jul 14 15:17:42 node1 mysqld: #011 at gcomm/src/pc.cpp:connect():141
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -110 (Connection timed out)
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'Galera Test' at 'gcomm://192.168.41.11,192.168.41.12': -110 (Connection timed out)
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [ERROR] WSREP: gcs connect failed: Connection timed out
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [ERROR] WSREP: wsrep::connect() failed: 7
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [ERROR] Aborting
Jul 14 15:17:42 node1 mysqld:
Jul 14 15:17:42 node1 mysqld: 140714 15:17:42 [Note] WSREP: Service disconnected.
Jul 14 15:17:43 node1 mysqld: 140714 15:17:43 [Note] WSREP: Some threads may fail to exit.
Jul 14 15:17:43 node1 mysqld: 140714 15:17:43 [Note] /usr/sbin/mysqld: Shutdown complete
Jul 14 15:17:43 node1 mysqld:
Jul 14 15:17:43 node1 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

The important information here is the following line:

[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)

When node1 starts MySQL, it tries to join an existing cluster. But because both nodes are currently down, there is no primary node available (see this page for a good and short explanation).

So when a Galera Cluster must be started from "zero" again, the first node must be started with the "wsrep-new-cluster" command (exactly during the set up of a new cluster):

service mysql start --wsrep-new-cluster
 * Starting MariaDB database server mysqld                               [ OK ]
 * Checking for corrupt, not cleanly closed and upgrade needing tables.

In syslog, the following log entries can be found:

Jul 14 15:18:43 node1 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Jul 14 15:18:43 node1 mysqld_safe: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.hK64YC' --pid-file='/var/lib/mysql/node1-recover.pid'
Jul 14 15:18:46 node1 mysqld_safe: WSREP: Recovered position cc4fb7ad-e5ab-11e3-8fae-d3fd14daa6a4:391488
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: wsrep_start_position var submitted: 'cc4fb7ad-e5ab-11e3-8fae-d3fd14daa6a4:391488'
[...]
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Start replication
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Setting initial position to cc4fb7ad-e5ab-11e3-8fae-d3fd14daa6a4:391488
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: protonet asio version 0
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: backend: asio
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: GMCast version 0
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: (62a145e9-0b59-11e4-9a2f-c62c46c73c36, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: (62a145e9-0b59-11e4-9a2f-c62c46c73c36, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: EVS version 0
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: PC version 0
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: gcomm: bootstrapping new group 'Galera Test'
[...]
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: gcomm: connected
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Opened channel 'Galera Test'
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
[...]
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] WSREP: Quorum results:
Jul 14 15:18:46 node1 mysqld: #011version    = 3,
Jul 14 15:18:46 node1 mysqld: #011component  = PRIMARY,
Jul 14 15:18:46 node1 mysqld: #011conf_id    = 0,
Jul 14 15:18:46 node1 mysqld: #011members    = 1/1 (joined/total),
[...]
Jul 14 15:18:46 node1 mysqld: 140714 15:18:46 [Note] /usr/sbin/mysqld: ready for connections.
Jul 14 15:18:46 node1 mysqld: Version: '10.0.10-MariaDB-1~trusty-wsrep-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution, wsrep_25.10.r3968
Jul 14 15:18:47 node1 /etc/mysql/debian-start[13061]: Upgrading MySQL tables if necessary.

The other nodes can be started normally and they will automatically connect to the primary node.

 

LXC start fails with get_cgroup failed to receive response error
Monday - Jul 14th 2014 - by - (0 comments)

After a reboot of a physical test server, two out of 5 Linux Containers (LXC) didn't start up automatically anymore.

When I manually tried to start them, I got the following error:

lxc-start: command get_cgroup failed to receive response

Although my research on the web pointed me to an Apparmor bug (Ubuntu bug #1295774), I could rule this bug out because the "fixed" Apparmor version was already installed:

dpkg -l | grep appa
ii  apparmor          2.8.95~2430-0ubuntu5  amd64 User-space parser utility for AppArmor
ii  libapparmor-perl  2.8.95~2430-0ubuntu5  amd64 AppArmor library Perl bindings
ii  libapparmor1:amd64 2.8.95~2430-0ubuntu5 amd64 changehat AppArmor library

Interestingly, as I mentioned at the begin, other LXC's were started without problem. I checked out the config files and found a difference that the started containers were using the direct path of a logical volume (LV) as rootfs while the other two (which didn't start) were using a directory path.

Turns out... this path was not mounted (I forgot the entry in /etc/fstab). ^^
After mounting the LV's to the expected path, lxc-start worked fine.

So the error message "get_cgroup failed to receive response" can also appear if the rootfs is missing or not mounted.

 

Bye Bye Windows XP
Monday - Jul 7th 2014 - by - (0 comments)

The Microsoft support for Windows XP ended already in April 2014 but only today I saw this warning message on a virtual machine running Windows XP:

Windows XP End of support warning 

Looks like an official bye bye wave.

 

Presenting new Nagios plugin: check_promise_vtrak
Friday - Jul 4th 2014 - by - (0 comments)

I'd like to announce the immediate availability of a new Nagios/Icinga plugin called check_promise_vtrak.pl, a plugin to monitor a Vtrak storage device from Promise.

It is based on the already existing open source plugin check_promise_chassis.pl and many helpful information was taken from this plugin written by Barry O' Donovan.

Although both plugins do similar checks, check_promise_vtrak.pl was completely rewritten and follows the programming structure (and layout) of check_ibm_ts_tape.pl, a plugin I wrote in the past, also allowing separate checks through option parameters and check types.

The plugin page contains the official documentation of the parameters and how to use it. It also links to the corresponding github repository. Yes, this is an invitation to contribute to the plugin, to make it better and to report about bugs! At this point I'd like to thank the open source community, especially Barry O'Donovan for his original plugin (check_promise_chassis.pl) and Fabien Huttin for testing several Vtrak devices with the new plugin for me.

 


Go to Homepage home
Linux Howtos how to's
Nagios Plugins nagios plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

8543 Days
until Death of Computers
Why?