Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

smartctl on FreeBSD with CCISS (HP SmartArray) raid: Watch out!
Monday - Nov 11th 2013 - by - (1 comments)

Last week I wrote several posts about S.M.A.R.T. checks on FreeBSD. Well they work, they can definitely be used for monitoring on production servers, but there is one issue which needs to be addressed: The drives order used in smartctl (cciss,N) is not forcibly the physical order!

Let's go to some detail. Last week I got an alert from check_smart.pl that a disk on a HP Proliant DL380 G5 running with FreeBSD 9.1 got defect sectors (elements in grown defect list). I verified this manually with the smartctl command:

smartctl -d cciss,0 /dev/ciss0 -a
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2012-12, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ciss0 [cciss_disk_00] [SCSI]: Device open changed type from 'sat,auto' to 'cciss'
Vendor:             HP
[...]
Serial number:      123450000999VE
Device type:        disk
Transport protocol: SAS
Local Time is:      Fri Nov  8 13:59:19 2013 CET
[...]
Elements in grown defect list: 12

Logically, to me, "cciss,0" means the very first disk of the server. So that would be drive slot #1.
I exchanged the drive and ran smartctl again:

smartctl -d cciss,0 /dev/ciss0 -a
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2012-12, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ciss0 [cciss_disk_00] [SCSI]: Device open changed type from 'sat,auto' to 'cciss'
Vendor:             HP
[...]
Serial number:      123450000999VE
Device type:        disk
Transport protocol: SAS
Local Time is:      Fri Nov  8 15:27:33 2013 CET
[...]
Elements in grown defect list: 14

Did you notice the exact same serial number of the drive behind cciss,0? So that means that I have replaced the wrong disk.

After some research, I found this archived FreeBSD mailing list article from 2008: http://lists.freebsd.org/pipermail/freebsd-ports/2008-April/048312.html
The author of the post describes the exact same phenomenon on his FreeBSD machine:

The recent incorporation of the FreeBSD CISS SMART support into the
mainstream smartmontools distribution has had some unexpected results on
several HP ProLiant DL380 G3 machines.  I have five DL380/G3s with four
drives each; all have the same symptoms now: querying a given ciss/scsi
target gives results for the wrong drive

It seems the correct disk labeling/numbering worked before smartmontools 5.38. Unfortunately FreeBSD does not have tools to list all physical drives. camcontrol devlist only shows the logical drive's raid controller.

As stupid as it sounds... but labeling the drives' serial number with a sticker can help you identify the disk in the physical slots. You can find the serial number of the disk in the smartctl output and match it against the physical drive.

So if you use FreeBSD behind a CCISS (HP SmartArray) Raid Controller, be extra careful and don't trust the cciss numbering!

Update, still Nov 11th 2013:
After some replacement tests, it seems that FreeBSD is seeing the disk the other way around. So cciss,0 is the last disk, cciss,3 the first (in a server with 4 physical disks). If it is always like this, the physical disk can be identified. But what happens if a new disk is inserted? Is a recount necessary when disk #5 appears as cciss,0 or will it appear as cciss,5? I have no idea...

Update 2, again Nov 11th 2013:
I just came across the command cciss_vol_status which can be compiled on FreeBSD and Linux from http://sourceforge.net/projects/cciss/files/cciss_vol_status/. So I gave it a shot and installed it:

cd /tmp
fetch http://downloads.sourceforge.net/project/cciss/cciss_vol_status/cciss_vol_status-1.11.tar.gz
tar -xzf cciss_vol_status-1.11.tar.gz
cd cciss_vol_status-1.11
./configure
make
make install

Then I ran the command against the /dev/ciss0 device and at first I was disappointed - again:

cciss_vol_status -s /dev/ciss0
/dev/ciss0: (Smart Array P400) RAID 1 Volume 0 status: OK.
/dev/ciss0: (Smart Array P400) RAID 1 Volume 1 status: OK.

My face brightened up when I tried the verbose option (-V):

cciss_vol_status -V /dev/ciss0
Controller: Smart Array P400
  Board ID: 0x3234103c
  Logical drives: 2
  Running firmware: 5.20
  ROM firmware: 5.20
/dev/ciss0: (Smart Array P400) RAID 1 Volume 0 status: OK.
/dev/ciss0: (Smart Array P400) RAID 1 Volume 1 status: OK.
  Physical drives: 4
   connector 2I box 1 bay 4  HP DG072ABAB3  XXXXXXXX00009732RCV7   HPDD OK
   connector 2I box 1 bay 3  HP DG072BB975  XXXXXXXX00009907Q0VR   HPDC OK
   connector 2I box 1 bay 2  HP DG072BB975  XXXXXXXX00009906P4DN   HPDC OK
   connector 2I box 1 bay 1  HP DG072BB975  XXXXXXXX00009907RPKW   HPDC OK
/dev/ciss0(Smart Array P400:0): Non-Volatile Cache status:
                   Cache configured: Yes
                  Read cache memory: 52 MiB
                 Write cache memory: 156 MiB
                Write cache enabled: Yes

So THIS is exactly what I needed! I can now finally compare the serial number from smartctl output and match it against the correct physical slot. Problem solved! 

 

Add a comment

Show form to leave a comment

Comments (newest first):

macan wrote on Jun 8th, 2016:
cciss_vol_status is in ports:
/usr/ports/sysutils/cciss_vol_status/


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7541 Days
until Death of Computers
Why?