System freeze because of defect Storage Controller ECC Memory

Written by - 0 comments

Published on - Listed in Hardware Solaris Unix

A couple of days ago I have seen strange issues on a physical server after a hardware upgrade. Once the server was powered on (server booted just fine), it took around 5 minutes until the system completely froze. Ping was still responding but the system itself was completely frozen - not even the directly attached console was working.

Once the hardware upgrade (additional memory) was reverted, the system booted again - but again after a couple of minutes the system froze.

After the next boot I quickly followed the messages log to check for system related errors and found these entries appear just before another freeze:

tail -f /var/adm/messages
Feb 27 20:14:49 myhostname genunix: [ID 936769] sdt0 is /pseudo/sdt@0
Feb 27 20:14:49 myhostname pseudo: [ID 129642] pseudo-device: fssnap0
Feb 27 20:14:49 myhostname genunix: [ID 936769] fssnap0 is /pseudo/fssnap@0
Feb 27 20:15:22 myhostname cpqary3: [ID 823470 kern.notice] NOTICE:  Smart Array P410i Controller
Feb 27 20:15:22 myhostname cpqary3: [ID 823470 kern.notice]  Controller memory ECC error limit exceeded
Feb 27 20:15:22 myhostname cpqary3: [ID 100000 kern.notice]

Reason was that the dedicated memory for the Storage Controller (Smart Array P410i) was defect and needed to be replaced. Once this was done, the machine booted fine again and is running smoothly since then.

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.