A couple of days ago I have seen strange issues on a physical server after a hardware upgrade. Once the server was powered on (server booted just fine), it took around 5 minutes until the system completely froze. Ping was still responding but the system itself was completely frozen - not even the directly attached console was working.
Once the hardware upgrade (additional memory) was reverted, the system booted again - but again after a couple of minutes the system froze.
After the next boot I quickly followed the messages log to check for system related errors and found these entries appear just before another freeze:
tail -f /var/adm/messages
Feb 27 20:14:49 myhostname genunix: [ID 936769 kern.info] sdt0 is /pseudo/sdt@0
Feb 27 20:14:49 myhostname pseudo: [ID 129642 kern.info] pseudo-device: fssnap0
Feb 27 20:14:49 myhostname genunix: [ID 936769 kern.info] fssnap0 is /pseudo/fssnap@0
Feb 27 20:15:22 myhostname cpqary3: [ID 823470 kern.notice] NOTICE: Smart Array P410i Controller
Feb 27 20:15:22 myhostname cpqary3: [ID 823470 kern.notice] Controller memory ECC error limit exceeded
Feb 27 20:15:22 myhostname cpqary3: [ID 100000 kern.notice]
Reason was that the dedicated memory for the Storage Controller (Smart Array P410i) was defect and needed to be replaced. Once this was done, the machine booted fine again and is running smoothly since then.
No comments yet.
Personal Internet VMware PHP Linux Shell Bluecoat Proxy Windows Hardware Virtualization Nagios MySQL DB Monitoring Mail Android Network Wyse Hacks Tomcat Postgres Apple Mac Backup BSD ZFS Solaris SmartOS Unix Multimedia Perl Database MongoDB CMS OTRS FreeBSD Wordpress LXC Nginx Proxmox DNS Graphics GlusterFS Security Chef HAProxy Icinga Ansible HTML MariaDB Containers Rancher Docker AWS ELK Kibana Logstash Filebeat Varnish PGSQL PostgreSQL ElasticSearch CouchDB Bash Macintosh Container Minio Grafana InfluxDB Databases NFS OSSEC SystemD Java Zoneminder Surveillance Elasticsearch SSL TLS Icingaweb2 Cloud Wireless Kubernetes Ubuntu