Today I had a case, where a zfs file system ran full. This wouldn't normally be very interesting, but in this case, the output of df and du were completely showing different values. Why?
Besides the information that the file system is full, I was also told from the user that "some files were already cleaned" but of course without details. Although the user deleted some big files, the filesystem is still shown as full (99%). The following troubleshooting commands were launched on a Solaris OS.
On my own checks with df, zfs and du I noticed a big difference on how the three commands calculated the used disk space:
# df -h | grep polo
datapool/myapp 5.9G 5.8G 75M 99% /opt/myapp
# zfs list -t filesystem | grep myapp
datapool/myapp 5.80G 75.5M 5.80G /opt/myapp
# du -ksh /opt/myapp
715M /opt/myapp
The 'du' command shows it clearly: There are actually only 715MB used. No, there were no zfs snapshots on this filesystem.
After I compared the values a couple of times, I even noticed a slight change in df, where the file system still grew. This fact told me that something must still be written somewhere in /opt/myapp.
Let's use the powerful dtrace command to see what is currently written on this server:
# dtrace -n 'syscall::write:entry { @[fds[arg0].fi_pathname] = count(); }'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C
/devices/pseudo/clone@0:ptm 1
/devices/pseudo/pts@0:1 1
/var/spool/postfix/public/pickup 1
/opt/myapp/logs/myapp.example.com-access_log 6
/opt/myapp/logs/testapp.example.com-access_log 6
/opt/myapp/logs/testapp.example.com-error_log 6
/opt/myapp/logs/alpha.example.com-access_log 6
/opt/myapp/logs/prod.example.com-access_log 6
/opt/myapp/logs/prod.example.com-error_log 6
The above output shows me all the files that were written during I let dtrace run. And here we got some log files in our famous /opt/myapp path.
Let's check, how big the log files are:
# ls -la /opt/myapp/logs
total 6
drwxr-xr-x 2 myapp myapp 2 Oct 17 12:03 .
drwxr-xr-x 11 myapp myapp 15 Oct 18 08:01 ..
Ehm... They're... gone? So that must have been the files that were already 'cleaned up' by the user. But why and who is still writing to non-existant files, therefore blocking the disk inodes to be freed?
Let's again use dtrace to see which process (execname) writes to which file:
# dtrace -n 'syscall::write:entry { printf("%s writes in %s", execname, fds[arg0].fi_pathname);}'
dtrace: description 'syscall::write:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 44639 write:entry httpd.worker writes in
0 44639 write:entry httpd.worker writes in /opt/myapp/logs/testapp.example.com-error_log
0 44639 write:entry httpd.worker writes in /opt/myapp/logs/testapp.example.com-access_log
0 44639 write:entry httpd.worker writes in
0 44639 write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-error_log
0 44639 write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-access_log
0 44639 write:entry httpd.worker writes in /opt/myapp/logs/alpha.example.com-access_log
0 44639 write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-access_log
0 44639 write:entry dtrace writes in /devices/pseudo/pts@0:1
0 44639 write:entry dtrace writes in /devices/pseudo/pts@0:1
0 44639 write:entry dtrace writes in /devices/pseudo/pts@0:1
0 44639 write:entry dtrace writes in /devices/pseudo/pts@0:1
So finally I see that the process 'httpd.worker' is still writing to the filesystem, although the files are not there anymore. The continuous write on the same files block the inodes on the disk, having the same disk space effect as if the files were still there -> using disk space.
As a solution, the responsible process must be reloaded:
# svcadm refresh svc:/network/http:sun-apache2
And, no surprise, df and zfs now show correct values:
# df -h | grep myapp
datapool/myapp 5.9G 766M 5.1G 13% /opt/myapp
# zfs list -t filesystem | grep myapp
datapool/myapp 766M 5.13G 766M /opt/myapp
Note for Linux: As written above, these dtrace commands were launched in Solaris. As of today (18.10.2012) the file descriptor (fds) is not yet available nor working in dtrace4linux. But I wrote to Paul Fox, the responsible guy for dtrace4linux, and he responded me that this will most likely be released in the next ~2 weeks.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder