Supervision of an application (daemon process) with monit

Written by - 0 comments

Published on February 26th 2016 - Listed in Linux


Today I tried to use supervisor to control and supervise the state of a process - a process which should automatically be restarted in case of a failure.

I installed supervisor, created the config file for the [process:myapp] and defined the relevant settings. But when I started the process with supervisorctl, I got this:

supervisorctl start myapp
myapp: ERROR (abnormal termination)

==> supervisord.log <==
2016-02-26 10:03:38,336 INFO spawned: 'myapp' with pid 6760
2016-02-26 10:03:38,380 INFO exited: myapp (exit status 1; not expected)
2016-02-26 10:03:40,384 INFO spawned: 'myapp' with pid 6769
2016-02-26 10:03:40,407 INFO exited: myapp (exit status 1; not expected)
2016-02-26 10:03:43,412 INFO spawned: 'myapp' with pid 6778
2016-02-26 10:03:43,435 INFO exited: myapp (exit status 1; not expected)
2016-02-26 10:03:44,436 INFO gave up: myapp entered FATAL state, too many start retries too quickly

What the hell? After some research I came across the following text - funnily on the official documentation of supervisor (yeah, rtfm, I know):

Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started.

Ah crap. The program was written to be by definition run as a daemon and spawns itself several subprocesses. After several tests and attempted workarounds I ditched supervisor and went on to monit.

Installation, quick and painless:

apt-get install monit

Then I adapted the default "check interval" from the default 120 seconds to 30 seconds:

sed -i "s/set daemon 120/set daemon 30/" /etc/monit/monitrc

And created a separate config file for the monit http daemon:

cat /etc/monit/conf.d/monit-http
set httpd port 2812 and
  allow localhost

Note: You should add authentication, too!

That's already the basic configuration for monit. Let's continue with the monit check for myapp:

cat /etc/monit/conf.d/myapp
check process myapp matching "/srv/myapp/bin/myapp"
    start program = "/etc/init.d/myapp start"
    stop program = "/etc/init.d/myapp stop"
    if failed host localhost port 8088 protocol http then restart
    if 5 restarts within 5 cycles then timeout

Because this daemon "myapp" doesn't create a PID file when it's started, I used the "matching" option (instead of the "with pidfile" option seen everywhere in the monit documentation). Matching in this case simply checks the running processes if it matches the given value ("/srv/myapp/bin/myapp").
"start program" defines which command to execute to start the application.
"stop program" defines which command to execute to stop the application.
The first if condition defines the application specific check. In this case the application is listening on port 8088 and serves as little web server. If the check fails to access "localhost port 8080" with "protocol http", then a restart should be initiated.
If 5 restarts within 5 cycles didn't work, then monit should timeout -> myapp will then be unmonitored.
A cycle in this case is the "check interval" defined in /etc/monit/monitrc by the "set daemon" option. As this was previously set to 30 (seconds), this means 5 cycles = 150 seconds.

Activate the new config:

/etc/init.d/monit restart

And now the status can be checked:

monit status
The Monit daemon 5.6 uptime: 0m

Process 'myapp'
  status                            Running
  monitoring status                 Monitored
  pid                               19751
  parent pid                        1
  uptime                            1h 45m
  children                          0
  memory kilobytes                  419324
  memory kilobytes total            419324
  memory percent                    6.8%
  memory percent total              6.8%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                0.001s to 192.168.253.111:8088 [HTTP via TCP]
  data collected                    Fri, 26 Feb 2016 13:55:27

System 'myapp-app01-test'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.19] [0.15] [0.14]
  cpu                               2.4%us 0.9%sy 0.1%wa
  memory usage                      2748432 kB [44.9%]
  swap usage                        171584 kB [4.3%]
  data collected                    Fri, 26 Feb 2016 13:55:27

Now what happens if the process myapp dies? Let's try:

kill -9 `pgrep myapp`; tail -f /var/log/monit.log

A few seconds later the following entries appeared:

[CET Feb 26 13:58:27] error    : 'myapp' process is not running
[CET Feb 26 13:58:27] info     : 'myapp' trying to restart
[CET Feb 26 13:58:27] info     : 'myapp' start: /etc/init.d/myapp
[CET Feb 26 13:58:57] info     : 'myapp' process is running with pid 4405

And the process is up again:

pgrep myapp
4405

Success! I love to go into the weekend like this! :-)


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.