Get Unbound DNS lookups working in Ubuntu 16.04 Xenial, understand SERVFAIL

Written by - 3 comments

Published on - Listed in Linux DNS


For an internal DNS resolver/cacher, I decided to use Unbound. However after the installation of unbound 1.5.8-1ubuntu1 and an initial configuration, DNS lookups didn't work.

The config:

server:
    interface: 0.0.0.0
        access-control: 10.0.0.0/16 allow
        access-control: 127.0.0.0/8 allow
        access-control: 172.16.0.0/12 allow
        access-control: 192.168.0.0/16 allow
        verbosity: 1
        root-hints: /var/lib/unbound/root.hints

forward-zone:
      name: "."
      forward-addr: domaincontroller1.example.com
      forward-addr: domaincontroller2.example.com
      forward-addr: domaincontroller3.example.com
      #forward-addr: 8.8.4.4        # Google
      #forward-addr: 8.8.8.8        # Google

DNS resolving didn't work. Unbound returned a SERVFAIL error.

# dig claudiokuenzler.com A @localhost

; <<>> DiG 9.10.3-P4-Ubuntu <<>> claudiokuenzler.com A @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8470
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;claudiokuenzler.com.            IN    A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Feb 16 07:04:55 CET 2017
;; MSG SIZE  rcvd: 37

When I restarted Unbound, I was puzzled how slow the restart was and eventually some errors showed up in syslog:

Feb 16 06:46:52 dnscacher systemd[1]: Stopping unbound.service...
Feb 16 06:46:52 dnscacher unbound[20218]:  * Stopping DNS server unbound
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: service stopped (unbound 1.5.8).
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: server stats for thread 0: 8 queries, 4 answers from cache, 4 recursions, 0 prefetch
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: server stats for thread 0: requestlist max 1 avg 0.5 exceeded 0 jostled 0
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: average recursion processing time 0.183547 sec
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: histogram of recursion processing times
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: [25%]=0.000384 median[50%]=0.000512 [75%]=0.393216
Feb 16 06:46:52 dnscacher unbound: [19904:0] info: lower(secs) upper(secs) recursions
Feb 16 06:46:52 dnscacher unbound: [19904:0] info:    0.000256    0.000512 2
Feb 16 06:46:52 dnscacher unbound: [19904:0] info:    0.262144    0.524288 2
Feb 16 06:46:52 dnscacher unbound[20218]:    ...done.
Feb 16 06:46:52 dnscacher systemd[1]: Stopped unbound.service.
Feb 16 06:46:52 dnscacher systemd[1]: Starting unbound.service...
Feb 16 06:46:52 dnscacher unbound[20247]:  * Starting DNS server unbound
Feb 16 06:47:42 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 16 06:47:42 dnscacher unbound-anchor: fail: the anchor is NOT ok and could not be fixed
Feb 16 06:47:42 dnscacher unbound: [20339:0] notice: init module 0: validator
Feb 16 06:47:42 dnscacher unbound: [20339:0] notice: init module 1: iterator
Feb 16 06:47:42 dnscacher unbound: [20339:0] info: start of service (unbound 1.5.8).
Feb 16 06:47:42 dnscacher unbound: [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 16 06:47:42 dnscacher unbound: message repeated 5 times: [ [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN]
Feb 16 06:47:42 dnscacher unbound[20247]:    ...done.
Feb 16 06:47:42 dnscacher systemd[1]: Started unbound.service.
Feb 16 06:47:42 dnscacher systemd[1]: Reached target Host and Network Name Lookups.

It turned out that the slow start is caused by the unbound-achor verification, which can be manually triggered using the following command:

# time unbound-anchor -a /var/lib/unbound/root.key -v
/var/lib/unbound/root.key has content
fail: the anchor is NOT ok and could not be fixed

real    0m50.453s
user    0m0.004s
sys    0m0.012s

Here we have the 50 seconds which slowed down the start (see times in the syslog log above). By using tcpdump, I was able to determine what happens during the start (and what causes this slow startup):

# service unbound restart & 2>&1; tcpdump port 53
[1] 11942
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
10:58:07.788300 IP dnscacher.23444 > 192.5.5.241.53: 1053% [1au] NS? . (28)
10:58:08.164795 IP dnscacher.58715 > 192.112.36.4.53: 41519% [1au] NS? . (28)
10:58:08.541290 IP dnscacher.26205 > 192.112.36.4.53: 52241% [1au] NS? . (28)
10:58:08.917835 IP dnscacher.18709 > 199.7.91.13.53: 42065% [1au] NS? . (28)
10:58:09.294371 IP dnscacher.42646 > 199.7.91.13.53: 21034% [1au] NS? . (28)
10:58:09.670891 IP dnscacher.56481 > 198.97.190.53.53: 9754% [1au] NS? . (28)
10:58:10.047466 IP dnscacher.61621 > 198.97.190.53.53: 51384% [1au] NS? . (28)
10:58:10.424028 IP dnscacher.37874 > 192.5.5.241.53: 14983% [1au] NS? . (28)
10:58:11.176957 IP dnscacher.43299 > 192.5.5.241.53: 31294% [1au] NS? . (28)
[...] # shortened output

Unbound tried to connect to these and some more IP addresses. They look kinda familiar, don't they? Ah, of course, the DNS root servers, the spine of the Internet!

# grep 192.5.5.241 /var/lib/unbound/root.hints
F.ROOT-SERVERS.NET.      3600000      A     192.5.5.241

As this internal DNS resolver/cacher doesn't have Internet access (and therefore no DNS access to the root servers), this explains the slow start. But why wouldn't Unbound show any results at all? The connectivity to the internal Active Directory controllers (domaincontroller1-3) works.
Here I stumbled across Debian bug report #699641:

"you can set the "domain-insecure" option on your local domains in order to prevent DNSSEC failures from impacting the resolution of those domains"

So this is the actual workaround (disable DNSSEC) but doesn't explain the reason. There are two reasons for Unbound unable to show results for the resolved domains:

1.Missing Internet connection

The error in syslog gives a good hint:

Feb 16 06:47:42 dnscacher unbound: [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 16 06:47:42 dnscacher unbound: message repeated 5 times: [ [20339:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN]

As tcpdump revealed, Unbound tried to connect to all DNS Root servers. This happened not just for fun, Unbound tried to verify the DNSKEY in order to enable DNSSEC. As the Internet connection is non-existant on this host, the verification didn't work.

2. Unbound runs in DNSSEC mode

Unbound runs by default in DNSSEC mode, at least with the base installation on Ubuntu. The failed DNSKEY validation causes Unbound to stop working for DNSSEC requests - which is enabled. All non-DNSSEC requests return a SERVFAIL status.

Solution: Either make sure the cacher host has Internet access (at least to the DNS Root servers) or disable DNSSEC completely.

As this is a DNS resolver and cacher for internal networks only, I didn't see the necessity for DNSSEC and disabled it. In the Ubuntu setup two changes are necessary in order to disable DNSSEC:

Added domain-insecure option in unbound.conf:

server:
    interface: 0.0.0.0
        access-control: 10.0.0.0/16 allow
        access-control: 127.0.0.0/8 allow
        access-control: 172.16.0.0/12 allow
        access-control: 192.168.0.0/16 allow
        verbosity: 1
        root-hints: /var/lib/unbound/root.hints
        domain-insecure: *

But that's not enough. There's also a special config file active (/etc/unbound/unbound.conf.d/root-auto-trust-anchor-file.conf). In this file I commented/disabled the auto-trust-anchor-file option:

# cat /etc/unbound/unbound.conf.d/root-auto-trust-anchor-file.conf
server:
    # The following line will configure unbound to perform cryptographic
    # DNSSEC validation using the root trust anchor.
    #auto-trust-anchor-file: "/var/lib/unbound/root.key"

Then another restart of Unbound:

# service unbound restart & 2>&1; tail -f /var/log/syslog
Feb 16 07:11:11 dnscacher unbound[23185]:  * Starting DNS server unbound
Feb 16 07:12:01 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 16 07:12:01 dnscacher unbound-anchor: fail: the anchor is NOT ok and could not be fixed
Feb 16 07:12:01 dnscacher unbound: [23275:0] warning: did not exit gracefully last time (22816)
Feb 16 07:12:01 dnscacher unbound: [23276:0] notice: init module 0: validator
Feb 16 07:12:01 dnscacher unbound: [23276:0] notice: init module 1: iterator
Feb 16 07:12:01 dnscacher unbound: [23276:0] info: start of service (unbound 1.5.8).
Feb 16 07:12:01 dnscacher unbound[23185]:    ...done.
Feb 16 07:12:01 dnscacher systemd[1]: Started unbound.service.
Feb 16 07:12:01 dnscacher systemd[1]: Reached target Host and Network Name Lookups.

Yes, it still takes 50 seconds to start due to the missing Internet connection, but DNS resolution now works:

# dig -t A www.claudiokuenzler.com @localhost

; <<>> DiG 9.10.3-P4-Ubuntu <<>> -t A www.claudiokuenzler.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15474
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.claudiokuenzler.com.    IN    A

;; ANSWER SECTION:
www.claudiokuenzler.com. 38400    IN    A    144.76.83.23

;; Query time: 47 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Feb 16 07:12:33 CET 2017
;; MSG SIZE  rcvd: 68

And the already queried DNS record can now be found in Unbound's cache:

# unbound-control dump_cache | grep claudiokuenzler
www.claudiokuenzler.com.    23458    IN    A    144.76.83.23
msg www.claudiokuenzler.com. IN A 33152 1 23458 2 1 0 0
www.claudiokuenzler.com. IN A 0

Note: Even if I disabled DNSSEC in this case, it's still wise to open the firewall for this host to access the DNS Root servers. Not only to avoid the slow start or to keep DNSSEC as a possibility, but rather because this is the design of DNS. The Root servers should be queried if forwarders return no results.

Update February 17th 2017: As I mentioned at the end, a DNS connection to the Root servers should work if possible. So the missing firewall rules are now created and a restart is now as quick as expected:

# time service unbound restart & 2>&1; tail -f /var/log/syslog
Feb 17 11:37:15 dnscacher systemd[1]: Stopping unbound.service...
Feb 17 11:37:15 dnscacher unbound[12066]:  * Stopping DNS server unbound
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: service stopped (unbound 1.5.8).
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: server stats for thread 0: 766 queries, 763 answers from cache, 3 recursions, 0 prefetch
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: average recursion processing time 0.000464 sec
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: histogram of recursion processing times
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: [25%]=0 median[50%]=0 [75%]=0
Feb 17 11:37:15 dnscacher unbound: [4569:0] info: lower(secs) upper(secs) recursions
Feb 17 11:37:15 dnscacher unbound: [4569:0] info:    0.000256    0.000512 3
Feb 17 11:37:15 dnscacher unbound[12066]:    ...done.
Feb 17 11:37:15 dnscacher systemd[1]: Stopped unbound.service.
Feb 17 11:37:15 dnscacher systemd[1]: Starting unbound.service...
Feb 17 11:37:15 dnscacher unbound[12096]:  * Starting DNS server unbound
Feb 17 11:37:16 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 17 11:37:16 dnscacher unbound-anchor: success: the anchor is ok
Feb 17 11:37:16 dnscacher unbound: [12112:0] notice: init module 0: validator
Feb 17 11:37:16 dnscacher unbound: [12112:0] notice: init module 1: iterator
Feb 17 11:37:16 dnscacher unbound: [12112:0] info: start of service (unbound 1.5.8).
Feb 17 11:37:16 dnscacher unbound[12096]:    ...done.
Feb 17 11:37:16 dnscacher systemd[1]: Started unbound.service.

real    0m0.152s
user    0m0.000s
sys     0m0.000s
Feb 17 11:37:16 dnscacher systemd[1]: Reached target Host and Network Name Lookups.

And as expected, the DNSKEY validation is now working, too:

Feb 17 11:37:16 dnscacher unbound-anchor: /var/lib/unbound/root.key has content
Feb 17 11:37:16 dnscacher unbound-anchor: success: the anchor is ok


Add a comment

Show form to leave a comment

Comments (newest first)

ck from Switzerland wrote on Apr 10th, 2017:

The firewall rules allowed the outgoing connections to the DNS Root Servers (see https://www.iana.org/domains/root/servers) using tcp and udp 53.


brianx from usa wrote on Apr 8th, 2017:

The February 17th 2017 says that firewall rules fixed the fail: the anchor is NOT ok and could not be fixed error, but it's unclear what rules were added.
Does this require forwarding in specific udp or tcp ports or will the existing NAT forwards of my generic firewall be enough?


Dominik from wrote on Feb 16th, 2017:

I like it :)


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder