Analysis with dtrace: Filesystem is full although files were deleted

Written by - 0 comments

Published on - Listed in Linux Shell


Today I had a case, where a zfs file system ran full. This wouldn't normally be very interesting, but in this case, the output of df and du were completely showing different values. Why?

Besides the information that the file system is full, I was also told from the user that "some files were already cleaned" but of course without details. Although the user deleted some big files, the filesystem is still shown as full (99%). The following troubleshooting commands were launched on a Solaris OS.

On my own checks with df, zfs and du I noticed a big difference on how the three commands calculated the used disk space:

# df -h | grep polo
datapool/myapp      5.9G   5.8G    75M    99%    /opt/myapp

# zfs list -t filesystem | grep myapp
datapool/myapp       5.80G  75.5M  5.80G  /opt/myapp

# du -ksh /opt/myapp
 715M   /opt/myapp

The 'du' command shows it clearly: There are actually only 715MB used. No, there were no zfs snapshots on this filesystem.

After I compared the values a couple of times, I even noticed a slight change in df, where the file system still grew. This fact told me that something must still be written somewhere in /opt/myapp.

Let's use the powerful dtrace command to see what is currently written on this server:

# dtrace -n 'syscall::write:entry { @[fds[arg0].fi_pathname] = count(); }'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C
  /devices/pseudo/clone@0:ptm                                       1
  /devices/pseudo/pts@0:1                                           1
  /var/spool/postfix/public/pickup                                  1
  /opt/myapp/logs/myapp.example.com-access_log                    6
  /opt/myapp/logs/testapp.example.com-access_log                6
  /opt/myapp/logs/testapp.example.com-error_log                6
  /opt/myapp/logs/alpha.example.com-access_log                6
  /opt/myapp/logs/prod.example.com-access_log                6
  /opt/myapp/logs/prod.example.com-error_log                6
                                                          16

The above output shows me all the files that were written during I let dtrace run. And here we got some log files in our famous /opt/myapp path.
Let's check, how big the log files are:

# ls -la /opt/myapp/logs
total 6
drwxr-xr-x   2 myapp myapp       2 Oct 17 12:03 .
drwxr-xr-x  11 myapp myapp      15 Oct 18 08:01 ..

Ehm... They're... gone? So that must have been the files that were already 'cleaned up' by the user. But why and who is still writing to non-existant files, therefore blocking the disk inodes to be freed?
Let's again use dtrace to see which process (execname) writes to which file:

# dtrace -n 'syscall::write:entry { printf("%s writes in %s", execname, fds[arg0].fi_pathname);}'
dtrace: description 'syscall::write:entry ' matched 1 probe

CPU     ID       FUNCTION:NAME
  0  44639       write:entry httpd.worker writes in
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/testapp.example.com-error_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/testapp.example.com-access_log
  0  44639       write:entry httpd.worker writes in
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-error_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-access_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/alpha.example.com-access_log
  0  44639       write:entry httpd.worker writes in /opt/myapp/logs/prod.example.com-access_log
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1
  0  44639       write:entry dtrace writes in /devices/pseudo/pts@0:1

So finally I see that the process 'httpd.worker' is still writing to the filesystem, although the files are not there anymore. The continuous write on the same files block the inodes on the disk, having the same disk space effect as if the files were still there -> using disk space.

As a solution, the responsible process must be reloaded:

# svcadm refresh svc:/network/http:sun-apache2

And, no surprise, df and zfs now show correct values:

# df -h | grep myapp
datapool/myapp      5.9G   766M   5.1G    13%    /opt/myapp

# zfs list -t filesystem | grep myapp
datapool/myapp        766M  5.13G   766M  /opt/myapp

Note for Linux: As written above, these dtrace commands were launched in Solaris. As of today (18.10.2012) the file descriptor (fds) is not yet available nor working in dtrace4linux. But I wrote to Paul Fox, the responsible guy for dtrace4linux, and he responded me that this will most likely be released in the next ~2 weeks.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.