Had to investigate a problem where monit (a small but powerful daemon to check processes) had restarted the process of an application server.
The application server itself serves a web application and listens on port 8088. The monit check looks like this:
check process application matching "/srv/application/deploy/bin/application"
start program = "/etc/init.d/application start"
stop program = "/etc/init.d/application stop"
if failed host 127.0.0.1 port 8088 protocol http then restart
if 5 restarts within 5 cycles then timeout
In the monit logs (/var/log/monit.log) the restart can clearly be seen:
[CEST Oct 5 20:26:49] error : 'application' failed protocol test [HTTP] at INET[127.0.0.1:8088] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
[CEST Oct 5 20:26:49] info : 'application' trying to restart
[CEST Oct 5 20:26:49] info : 'application' stop: /etc/init.d/application
I also checked our Icinga monitoring if it did see the same problem. But no, Icinga's HTTP check worked. Then I double-checked with previous monit restarting actions. In the past, when the http request on port 8088 was not reachable, monit logged the following line:
error : 'application' failed, cannot open a connection to INET[127.0.0.1:8088] via TCP
But in this case the logged event clearly says:
'application' failed protocol test [HTTP] at INET[127.0.0.1:8088] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
The difference is therefore clearly on layer 7. The port was up - but inside the http protocol something was wrong. The message "resource temporarily unavailable" sounded somewhat familiar to me. I manage a lot of reverse proxies and whenever the upstream/backend server is gone, a 50x error is shown with a similar message. I asked the developer of this application if the application serves a 5xx. Turns out - almost; A backend resource of the application was not available in that very moment and the application, by its design, started to respond with http status 423 (Locked). I checked the monit documentation and indeed found a very important information:
STATUS option can be used to explicitly test the HTTP status code returned by the HTTP server. If not used, the HTTP protocol test will fail if the status code returned is greater than or equal to 400. You can override this behaviour by using the status qualifier.
Here we go. monit expects by default a http status of =< 400. As soon as the application returned a 423, monit considered the application failed and restarted it.
To solve this problem, a specially crafted application URL will from now on be requested (by using the "request" option in the monit check) which always returns a http status 200 as long as the application itself is running.