Had to investigate a problem where monit (a small but powerful daemon to check processes) had restarted the process of an application server.
The application server itself serves a web application and listens on port 8088. The monit check looks like this:
check process application matching "/srv/application/deploy/bin/application"
start program = "/etc/init.d/application start"
stop program = "/etc/init.d/application stop"
if failed host 127.0.0.1 port 8088 protocol http then restart
if 5 restarts within 5 cycles then timeout
In the monit logs (/var/log/monit.log) the restart can clearly be seen:
[CEST Oct 5 20:26:49] error : 'application' failed protocol test [HTTP] at INET[127.0.0.1:8088] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
[CEST Oct 5 20:26:49] info : 'application' trying to restart
[CEST Oct 5 20:26:49] info : 'application' stop: /etc/init.d/application
I also checked our Icinga monitoring if it did see the same problem. But no, Icinga's HTTP check worked. Then I double-checked with previous monit restarting actions. In the past, when the http request on port 8088 was not reachable, monit logged the following line:
error : 'application' failed, cannot open a connection to INET[127.0.0.1:8088] via TCP
But in this case the logged event clearly says:
'application' failed protocol test [HTTP] at INET[127.0.0.1:8088] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
The difference is therefore clearly on layer 7. The port was up - but inside the http protocol something was wrong. The message "resource temporarily unavailable" sounded somewhat familiar to me. I manage a lot of reverse proxies and whenever the upstream/backend server is gone, a 50x error is shown with a similar message. I asked the developer of this application if the application serves a 5xx. Turns out - almost; A backend resource of the application was not available in that very moment and the application, by its design, started to respond with http status 423 (Locked). I checked the monit documentation and indeed found a very important information:
STATUS option can be used to explicitly test the HTTP status code returned by the HTTP server. If not used, the HTTP protocol test will fail if the status code returned is greater than or equal to 400. You can override this behaviour by using the status qualifier.
Here we go. monit expects by default a http status of =< 400. As soon as the application returned a 423, monit considered the application failed and restarted it.
To solve this problem, a specially crafted application URL will from now on be requested (by using the "request" option in the monit check) which always returns a http status 200 as long as the application itself is running.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Container Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Icingaweb2 InfluxDB Internet Java KVM Kibana Kodi Kubernetes LTS LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher SSL Seafile Security Shell SmartOS Solaris Surveillance SystemD TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder