Forwarders not being used if they were momentarily offline
Hello Everyone,
Our DNS infrastructure is built around 3 primary SLES Linux BIND servers (named 9.3.4) which are based in our central data centers in North America and Europe. These "big daddy's" have a configuration that contains replicas of our corporate root zones and forwarding zones to legacy DNS zones from acquisitions.
A problem has surfaced when a zone forwarder is temporarily unreachable as would occur when the remote site reboots a router or firewall. If there was a forwarded query that failed while the forwarder was down, then all subsequent queries for that same record will continue to fail, even if the forwarder is actually reachable. It is as if named is caching connection state information. This was verified by taking a packet trace on the DNS server that revealed no forwarding activity even though the forwarder was up and available. If you wait a long period of time (15 - 30 mins) , the queries will begin to work again. Of course, restarting named also works.
I read where there was new code introduced in 9.3 to pick the most efficient forwarder. Perhaps this is the cause of the issue.
Has anyone else seen this and know of a workaround? I certainly don't want to import the legacy zones since most of them are Windows domains.
Thanks in advance!
|