A few days ago I came across an issue on an internal HAProxy (1.6.3) which uses a backend server in the AWS cloud. The backend server in this case was using a DNS record which was a CNAME to an AWS load balancer.
Over the last couple of weeks this particular backend reported being down several times and only a manual reload of HAProxy would resolve the issue.
After a detailed analysis, I came to the conclusion that this is related to HAProxy's internal DNS caching and that AWS change the DNS records of their load balancers (sometimes more, sometimes less often).
I posted the analysis and solution as a response on Stackoverflow, but I'll also share it here.
The HAProxy running in our internal networks would suddenly take this backend server DOWN with a L7STS/503 check result, while our monitoring was accessing the backend server (directly) just fine. As we run a HAProxy pair (LB01 and LB02) a reload of LB01 immediately worked and the backend server was UP again. On LB02 (not reloaded on purpose) this backend server is still down.
All this seems to related to a DNS change of the AWS LB and how HAProxy does DNS caching. By default, HAProxy resolves all DNS records (e.g. for backends) at startup/reload. These resolved DNS records then stay in HAProxy's own DNS cache. So you would have to launch a reload of HAProxy to renew the DNS cache.
Another and without doubt the better solution is to define DNS servers and the HAProxy internal DNS cache TTL. This is possible since HAProxy version 1.6 with a config snippet like this:
nameserver dnsmasq 127.0.0.1:53
nameserver dns1 192.168.1.1:53
nameserver dns1 192.168.1.253:53
hold valid 60s
server appincloud myawslb.example.com:443 check inter 2s ssl verify none resolvers mydns resolve-prefer ipv4
So what this does is to define a DNS nameserver set called "mydns" using the DNS servers defined by the entries starting with "nameserver". An internal DNS cache should be kept for 60s defined by "hold valid 60s". In the backend server's definition you now refer to this DNS nameserver set by adding "resolvers mydns". In this example it is preferred to resolve to IPv4 addresses by adding "resolve-prefer ipv4" (default is to use ipv6).
Note that in order to use "resolvers" in the backend server, "check" must be defined, too. The DNS lookup happens whenever the backend server check is triggered. In this example "check inter 2s" is defined which means a DNS lookup happens would happen every 2 seconds. This would be quite a lot of lookups. By setting the internal "hold" cache to 60 seconds, you can therefore limit the number of DNS lookups until the cache expires; latest after 62 seconds a new DNS lookup should therefore happen.
Starting with HAProxy version 1.8 there is even an advanced possibility called "Service Discovery over DNS" which uses DNS SRV Records. These records contain multiple response fields such as priorities, weights, etc. which can be parsed by HAProxy and update the backends accordingly.