HAProxy backend server behind AWS LB remains down with HTTP 503

Written by - 0 comments

Published on - Listed in HAProxy Linux Internet AWS Cloud

A few days ago I came across an issue on an internal HAProxy (1.6.3) which uses a backend server in the AWS cloud. The backend server in this case was using a DNS record which was a CNAME to an AWS load balancer. 

Over the last couple of weeks this particular backend reported being down several times and only a manual reload of HAProxy would resolve the issue.

After a detailed analysis, I came to the conclusion that this is related to HAProxy's internal DNS caching and that AWS change the DNS records of their load balancers (sometimes more, sometimes less often).

I posted the analysis and solution as a response on Stackoverflow, but I'll also share it here.

The HAProxy running in our internal networks would suddenly take this backend server DOWN with a L7STS/503 check result, while our monitoring was accessing the backend server (directly) just fine. As we run a HAProxy pair (LB01 and LB02) a reload of LB01 immediately worked and the backend server was UP again. On LB02 (not reloaded on purpose) this backend server is still down.

All this seems to related to a DNS change of the AWS LB and how HAProxy does DNS caching. By default, HAProxy resolves all DNS records (e.g. for backends) at startup/reload. These resolved DNS records then stay in HAProxy's own DNS cache. So you would have to launch a reload of HAProxy to renew the DNS cache.

Another and without doubt the better solution is to define DNS servers and the HAProxy internal DNS cache TTL. This is possible since HAProxy version 1.6 with a config snippet like this:



resolvers mydns
  nameserver dnsmasq
  nameserver dns1
  nameserver dns1
  hold valid 60s

frontend app-in
  bind *:8080
  default_backend app-out

backend app-out
  server appincloud myawslb.example.com:443 check inter 2s ssl verify none resolvers mydns resolve-prefer ipv4 

So what this does is to define a DNS nameserver set called "mydns" using the DNS servers defined by the entries starting with "nameserver". An internal DNS cache should be kept for 60s defined by "hold valid 60s". In the backend server's definition you now refer to this DNS nameserver set by adding "resolvers mydns". In this example it is preferred to resolve to IPv4 addresses by adding "resolve-prefer ipv4" (default is to use ipv6).

Note that in order to use "resolvers" in the backend server, "check" must be defined, too. The DNS lookup happens whenever the backend server check is triggered. In this example "check inter 2s" is defined which means a DNS lookup happens would happen every 2 seconds. This would be quite a lot of lookups. By setting the internal "hold" cache to 60 seconds, you can therefore limit the number of DNS lookups until the cache expires; latest after 62 seconds a new DNS lookup should therefore happen.

Starting with HAProxy version 1.8 there is even an advanced possibility called "Service Discovery over DNS" which uses DNS SRV Records. These records contain multiple response fields such as priorities, weights, etc. which can be parsed by HAProxy and update the backends accordingly.

Further information:

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder