lxc-destroy command not working when run as cron job

Written by - 0 comments

Published on - Listed in Linux


Been pulling my hair out (what's left of it) because a script which was working fine when I launched it manually, but the very same script did not work when it was run by cron. I narrowed it down to the "lxc-destroy" command which simply does not do anything when run by cron. All this on an Ubuntu 14.04 LTS with LXC 1.0.7.

I was able to reproduce it with a simple cronjob like this:

03 08 * * * lxc-stop -n mycontainer && lxc-destroy -n mycontainer

What happened right after the cronjob ran? The container was correctly stopped - but remained stopped. The container was still showing up in lxc-ls (in state STOPPED).  I suspected there is a problem deleting the logical volume (lxc.rootfs) of the container when lxc-destroy is launched, because a quick "lvs" after the cronjob ran through still showed the LV of the container.

Now I ran the same commands with strace to see more:

41 08 * * * strace -s 1000 -ff -o /var/log/stop.log lxc-stop -n nzzad-app01-test && strace -s 1000 -ff -o /var/log/destroy.log lxc-destroy -n nzzad-app01-test

This actually created several log files (each child process got its own log file):

-rw-r--r-- 1 root      root   2401 Sep 25 08:41 destroy.log.23100
-rw-r--r-- 1 root      root   2401 Sep 25 08:41 destroy.log.23101
-rw-r--r-- 1 root      root   5180 Sep 25 08:41 destroy.log.23102
-rw-r--r-- 1 root      root    376 Sep 25 08:41 destroy.log.23103
-rw-r--r-- 1 root      root  61340 Sep 25 08:41 destroy.log.23099

The relevant part of the parent PID (23099) showed:

stat("/var/lib/lxc/mycontainer/config", {st_mode=S_IFREG|0644, st_size=808, ...}) = 0
pipe([3, 4])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23100
close(4)                                = 0
read(3, "\377\377\377\377", 4)          = 4
close(3)                                = 0
wait4(23100, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23100
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23100, si_status=1, si_utime=0, si_stime=0} ---
socket(PF_LOCAL, SOCK_STREAM, 0)        = 3
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 40) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 110) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0
stat("/var/lib/lxc/mycontainer/config", {st_mode=S_IFREG|0644, st_size=808, ...}) = 0
open("/run/lxc/lock//var/lib/lxc/mycontainer", O_RDWR|O_CREAT, 0600) = 3
fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
pipe([4, 5])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23101
close(5)                                = 0
read(4, "\377\377\377\377", 4)          = 4
close(4)                                = 0
wait4(23101, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23101
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23101, si_status=1, si_utime=0, si_stime=0} ---
socket(PF_LOCAL, SOCK_STREAM, 0)        = 4
connect(4, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 40) = -1 ECONNREFUSED (Connection refused)
connect(4, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/mycontainer/command"}, 110) = -1 ECONNREFUSED (Connection refused)
close(4)                                = 0
open("/var/lib/lxc/mycontainer/lxc_snapshots", O_RDONLY) = -1 ENOENT (No such file or directory)
geteuid()                               = 0
pipe2([4, 5], O_CLOEXEC)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23102
close(5)                                = 0
fcntl(4, F_GETFL)                       = 0 (flags O_RDONLY)
fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8da4aff000
lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(4, "", 4096)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23102, si_status=127, si_utime=0, si_stime=0} ---
close(4)                                = 0
munmap(0x7f8da4aff000, 4096)            = 0
wait4(23102, [{WIFEXITED(s) && WEXITSTATUS(s) == 127}], 0, NULL) = 23102
stat("/dev/vglxc/mycontainer", {st_mode=S_IFBLK|0660, st_rdev=makedev(252, 0), ...}) = 0
open("/sys/dev/block/252:0/dm/uuid", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8da4aff000
read(4, "LVM-AGQfFleAZmtwXaRlf3qusphA68nyKV56qCiz847f5Jd9gR5Em5pzaJdSfo24s4br\n", 4096) = 69
close(4)                                = 0
munmap(0x7f8da4aff000, 4096)            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8da4af2b90) = 23103
wait4(23103, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23103
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23103, si_status=1, si_utime=0, si_stime=0} ---
write(2, "lxc_container: ", 15)         = 15
write(2, "lxccontainer.c: lxcapi_destroy: 2073 ", 37) = 37
write(2, "Error destroying rootfs for mycontainer", 44) = 44
write(2, "\n", 1)                       = 1
fcntl(3, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(3)                                = 0
write(2, "Destroying mycontainer failed\n", 35) = 35
exit_group(1)                           = ?
+++ exited with 1 +++

The more interesting part can be found in the log file of the last child pid (23103) which is responsible for the deletion of the logical volume:

set_robust_list(0x7f8da4af2ba0, 24)     = 0
execve("/usr/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 7 vars */]) = -1 ENOENT (No such file or directory)
execve("/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 7 vars */]) = -1 ENOENT (No such file or directory)
exit_group(1)                           = ?
+++ exited with 1 +++

Now I decided to run the exacte same lxc-destroy command "manually" on the command line, also using strace to log and catch a difference:

strace -s 1000 -ff -o /var/log/manualdestroy.log lxc-destroy -n mycontainer

And indeed! In the log file of the last child process I found this:

set_robust_list(0x7fde2db25ba0, 24)     = 0
execve("/usr/local/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/local/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/usr/bin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = -1 ENOENT (No such file or directory)
execve("/sbin/lvremove", ["lvremove", "-f", "/dev/vglxc/mycontainer"], [/* 34 vars */]) = 0
brk(0)                                  = 0x1a48000

I don't have to be a C programmer to understand what happens. When the command was launched by cron, the chlid process trying to remove the logical volume was looking for the lvremove command in /usr/bin and in /bin. When it didn't find it there, it exited with an error.
When I ran it manually, the same command was not only looking in /usr/bin and /bin, but also in /usr/local/sbin, /usr/local/bin, /usr/sbin and finally in /sbin where it found the lvremove command and continued.

But why is there a difference in which directories commands should be looked for? The answer is PATH.

The $PATH variable of root when run on the cli:

echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Now to see the $PATH variable of cron, I created a cronjob like this:

58 08 * * * echo $PATH > /var/log/path.log

And now let's take a look at /var/log/path.log:

cat /var/log/path.log
/usr/bin:/bin

Here we have the difference - and the reason why lxc-destroy does not work when run by cron.

So the solution for this problem is to define the PATH variable before running lxc-destroy, e.g. like this:

21 09 * * * export PATH=$PATH:/sbin; lxc-stop -n nzzad-app02-test && lxc-destroy -n nzzad-app02-test

Now lxc-destroy ran perfectly.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.