I have a "cloud" service I'm working with now using the SOAP driver. It
works, but this particular service deals with their scheduled maintenance
windows by having two data centers and switching the service between them
by changing the DNS information for the target web server.

So SOAP driver is up and running, talking to www.svchost.net which
resolves to (made up example), which physically lives in
data center Foo.

When they want to do work on the host in data center Foo, they change
www.svchost.net to, which physically lives in data center
Bar. They then shut down the host in Foo or do whatever needs to be done.

The next time they schedule an outage, they change www.svchost.net back
to, and the host in Bar goes down for maintenance.

This has caused no end of entertainment here on my end of things.
Generally, when the host goes down, I'll see an error reported in the
driver. Usually it's a "500 Internal Server" error, as the web server I'm
talking to may still be up, but the back end database server(s) are down.
But I've seen other errors returned as well. The SOAP driver doesn't
handle this itself, but I can react to these errors in policy, and am
doing so for the 500 error. Right now, a 500 error is converted to a
RETRY, so the driver will keep banging away on the last event until
somebody intervenes. I haven't yet, but I'm considering turning any error
code returned in to a FATAL error to force the driver to shut down.

The real fun is that several layers of "stuff" between the driver and the
outside world are all trying to be helpful by caching DNS / IP address
resolutions. So, while the service changes their own DNS entry, that
change is not getting all the way through.

The first level of caching is Java (eDir embedded JRE). In /opt/novell/
eDirectory/lib64/nds-modules/jre1.6.0_31/lib/security/java.security it
says that:

# The Java-level namelookup cache policy for successful lookups:
# any negative value: caching forever
# any positive value: the number of seconds to cache an address for
# zero: do not cache
# default value is forever (FOREVER). For security reasons, this
# caching is made forever when a security manager is set. When a security
# manager is not set, the default behavior in this implementation
# is to cache for 30 seconds.
# NOTE: setting this to anything other than the default value can have
#       serious security implications. Do not set it unless
#       you are sure you are not exposed to DNS spoofing attack.
I've not been able to find out exactly what Java thinks is a "successful"
lookup, so how long it holds on to the cache. It appears that
"successful", in this case, means a cache hit was successful. So once
Java sees that www.svchost.net resolves to, it will continue
returning that value forever, as long as something requests it at least
every 30 seconds (default timeout value). It does not appear that Java
ever goes back to DNS to see if that's actually the correct answer. I
don't think eDirectory has a "security manager" implemented, so I've
turned the setting here down to 0 (networkaddress.cache.ttl=0) in an
attempt to defeat Java's caching of DNS / IP address resolutions.

The next level of caching is nscd (Linux's name server caching daemon).
This has a hosts cache, which again caches DNS / IP resolutions.
Configuration in /etc/nscd.conf. Statistics available from "nscd -g"
shows something like:

hosts cache:

yes  cache is enabled
no  cache is persistent
yes  cache is shared
211  suggested size
216064  total data pool size
8832  used data pool size
600  seconds time to live for positive entries
0  seconds time to live for negative entries
55  cache hits on positive entries
0  cache hits on negative entries
1850  cache misses on positive entries
0  cache misses on negative entries
2% cache hit rate
64  current number of cached values
64  maximum number of cached values
2  maximum chain length searched
1  number of delays on rdlock
1  number of delays on wrlock
0  memory allocations failed
yes  check /etc/{hosts,resolv.conf} for changes
Again, it's not totally clear exactly what nscd considers to be a
"positive entry", so how long it will keep something in cache. As with
Java, it appears that as long as nscd is able to reply to a DNS query,
that's a cache hit, and it resets the timer on that particular entry in
the cache.

So, when the service switches www.svchost.net from to, first Java doesn't follow the change. Restarting the SOAP
driver doesn't help. Maybe stopping it, waiting a few minutes, then
restarting it would work, giving Java the time to timeout its cached DNS
entry. But then nscd steps in and also caches it, which re-poisons Java's
cache with the old value.

The last time this particular service provider had scheduled maintenance,
I had to use "nscd -i hosts" to purge the cache, then restart eDirectory
(ID Vault tree) to convince Java to purge its cache as well. Having now,
I think, disabled the Java cache, I'm only working on convincing nscd to
follow the changes to DNS.

Is anybody else using the SOAP (or the new REST) driver to talk to a host
that works this way and running in to something similar? Or am I just
spectacularly lucky here? Is there a better way to handle this?

David Gersic dgersic_@_niu.edu
Knowledge Partner http://forums.microfocus.com

Please post questions in the forums. No support provided via email.
If you find this post helpful, please click on the star below.