Monitor dual storage (raid) controller on IBM x3650 M4

Written by Claudio Kuenzler - 0 comments

Published on April 6th 2018 - Listed in Hardware Linux Monitoring

I wanted to monitor the current RAID status on an IBM x3650 M4 server, simply by using check_raid. I've been using this plugin for years and it supports most software and hardware raid controllers. I've never had any problems with it (once I installed the required cli tools for each hardware controller) - until today.

Due to a very strange hardware setup, inherited from an ex-colleague, the server turns out to have two different RAID controllers active. 12 physical drives are attached to one controller, 2 physical drives to another.

Once I installed the megacli command (from http://hwraid.le-vert.net/), the plugin correctly identified the physical drives behind /dev/sda:

# /usr/lib/nagios/plugins/check_raid -l
megacli
1 active plugins

# /usr/lib/nagios/plugins/check_raid
WARNING: megacli:[Volumes(1): DISK0.0:Optimal,WriteCache:DISABLED; Devices(12): 11,08,01,03,09,10,04,06,12,07,02,05=Online]

To disable the warning on the disabled WriteCache:

# /usr/lib/nagios/plugins/check_raid --cache-fail=OK
OK: megacli:[Volumes(1): DISK0.0:Optimal,WriteCache:DISABLED; Devices(12): 11,08,01,03,09,10,04,06,12,07,02,05=Online]

But where are the other two physical drives? From my experience with hardware raid controllers I was pretty sure that megacli is able to detect multiple controllers and is able to retrieve the drive information from all controllers.
A manual verification using megacli still only returned 12 drives:

# megacli -CfgDsply -aall |grep Physical
Physical Disk Information:
Physical Disk: 0
Physical Sector Size: 512
Physical Disk: 1
Physical Sector Size: 512
Physical Disk: 2
Physical Sector Size: 512
Physical Disk: 3
Physical Sector Size: 512
Physical Disk: 4
Physical Sector Size: 512
Physical Disk: 5
Physical Sector Size: 512
Physical Disk Information:
Physical Disk: 0
Physical Sector Size: 512
Physical Disk: 1
Physical Sector Size: 512
Physical Disk: 2
Physical Sector Size: 512
Physical Disk: 3
Physical Sector Size: 512
Physical Disk: 4
Physical Sector Size: 512
Physical Disk: 5
Physical Sector Size: 512

Thankfully a colleague, who recently was working on that particular server, made a screenshot of the storage controller menu during the boot process:

Two different storage controllers in the same server

As it turns out, there are two different storage controllers built into that server. One is a MegaRaid controller (ServeRAID M5210) and one is a MPT controller:

# lspci | grep -i LSI
0a:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire] (rev 03)
14:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

No wonder megacli wasn't able to find the drives!

I tried again with "mpt-status" (http://hwraid.le-vert.net/wiki/LSIFusionMPT), but this didn't show any config:

# apt-get install mpt-status

# /usr/sbin/mpt-status -p
Checking for SCSI ID:0
ioctl: No such device

I removed mpt-status again and went on to try the command "sas2ircu" for newer MPT cards. Finally I got some output:

# apt-get install sas2ircu

# sas2ircu LIST
LSI Corporation SAS2 IR Configuration Utility.
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2009-2013 LSI Corporation. All rights reserved.

         Adapter      Vendor Device                       SubSys SubSys
Index    Type          ID      ID    Pci Address          Ven ID Dev ID
----- ------------ ------ ------ -----------------    ------ ------
   0     SAS2004     1000h    70h   00h:0ah:00h:00h      1014h   040eh
SAS2IRCU: Utility Completed Successfully.

And, hurray, check_raid was now able to read the infos from both controllers:

# /usr/lib/nagios/plugins/check_raid -l
megacli
sas2ircu
2 active plugins

# /usr/lib/nagios/plugins/check_raid --cache-fail=OK
OK: megacli:[Volumes(1): DISK0.0:Optimal,WriteCache:DISABLED; Devices(12): 11,08,01,03,09,10,04,06,12,07,02,05=Online]; sas2ircu:[ctrl #0: 1 Vols: Optimal: 2 Drives: Optimal (OPT)::]

Update November 15th 2018:

This article helped me again today when the check_raid plugin alarmed of a failed drive:

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

Blog Tags:

AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Icingaweb Icingaweb2 Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder