a story.. long.. problem but also solutions.. (worked at least for me !)

1/ 8 BM38 servers running S2S IKE/SKIP and 4 BM37 servers with SKIP of
course. some of the BM38 in the same tree, especailly the old and new
masters, some in their own tree because of this stupid design problem
making the access to Root mandatory to start the SCM... process. i've
also a 3d party firewall

problem : a slave becomes the master
the master becomes a slave
both have a new public ip address
both have to be moved into a new OU in the tree
both are moved physically in another building
... nobody on the remote sites to help...
... everything must be up again for yesterday...

i've the new MasterTRO exported from the server certificate of the new

both new and old masters private interfaces are on the same lan.. so
it's possible to keep running the communication with the rest of the
tree by operating the modifications for one after the other and playing
with the routes

2/ first i've removed the server becoming the new master from the member
list, cleaned everything not really necessary or easy to recreate (like
NBMRuleContainer), in order to facilit the move of the server object.
then i move him into his new place in the tree. checking that the time
is in sync, obits cleared, etc... then i create a new S2S configuration
with this server being the master. i reconfigure and disconnect
physically the public interface so no call will be made until i decide
it.. i add every member, plus each trusted root object in the TRC of the
new master when the server is in another tree.. like his public
interface is not physically connected, he's the only one to know it for
the moment.. i mean that there is no way for him to push his config
outside.. i build also the new minfo.vpn for the SKIP servers

3/ i change the configuration of the slaves... but one (unfortunately),
not available at this time (no chance, no way for waiting !)

* first i changed the delay to reload the config to 600s in each config
to have the time to do what i want and check what i did..
* removed the old MasterTRO from the TRC of each server and import the
new one
* change the Trusted master server certificate subject name

easier for the SKIP slaves, i can run a VPN client connection later and
generate the new SKIP file using ftp and telnet

for the 3dparty one, i just changed the ip address referencing the
master, no difficulty

that's all for the slaves.. i take the bid that they'll lose the
connectivity but once the new master is up it will come back using the
new config

4/ i delete the config of the old master, clean everything, move the
server object and declare it as a new slave. at the same time also i
give him his new public ip address and i disconnect it.. There is some
backlinks and obits that will have to be cleared because this time there
is a part of the tree which is unreachable but i think that the process
will run normally once the connectivity is up again

5/ i switch everything off and i move all the hardware in the new building

6/ i power the new master and immediately the tunnel is up for the
remote IKE slaves. i use a VPN client connection on the remote SKIP
servers, generate the new SKIP file that i introduce in the new master
with NWAdmin.. bingo, the tunnel goes up for them too

7/ now i power the old master (now a slave), and it's ok also...

at this stage everything is (seems to be) NICE ! i still have a remote
slave to reconfigure in his own tree but it should not be a problem...

8/ so i do it... i run a VPN client connection, iManager, new Trusted
master server certificate subject name, etc... and i wait for the tunnel
to goes up... NOTHING !! i "help" the master and the slave,
"synchronize", "stop/startvpn", restart finally both... NOTHING.. and of
course this slave is something like a hub for all a region.. and the
boss is crying !

there is only old entries in callmgr in the slave, including the call to
the old ip address of the old master.. it means that he doesn't "eat"
the configuration pushed by the new master (what i think at this time)..
so probably a corrupted file

-> i delete sys:etc/ike/rootcert content, csl.cfg, csl.dat, ipwan.cfg,
nlspstat.cfg and i restart.. the old config comes back, same call list !

-> i bump my head into the wall and finally i take a nlspstat.cfg file
from another server, make a few modifications using an hex editor, push
the file onto this slave and restart... the old config comes back, same
call list.. aaarrrgh ! during the night finally the call list is updated
but the server claims that it doesn't match with the csl.dat database

9/ finally i decided to send a new server built from the scratch.. 12
hours and 10.000 kms later someone plugged in the new server.. no more
old calls... the SAs are built correctly... but no call in the list ! so
the master was definitively not pushing at all the config

10/ i remembered something about the system/vpn directory containing
informations about all the VPN structure.. so i check in the master,
using an hex editor, and discovered that the member.dat file had never
been updated and was still containing the refs to the S2SMember
attributes linked to the old master in his old place in the tree. Then i
checked for the member.dat file in a slave.. and in each slave
preconfigured (step 3/) i discovered that the file had been correctly
updated... So only the slave becoming the master was not up2date.. At
this stage i don't if that's a bug or an error from me.. And i don't
think that i'll have the time to reproduce everything in a lab

So in fact the whole process for building and populating the VPN
configuration seems to be as follow

A/ iManager gives you a graphical interface to update fields and
attributes, all stored in the tree.. But in fact all these informations
are used to build conf files in system/vpn in btrieve format i think

B/ the master is using these informations to build the csl.dat,
ipwan.cfg and nlspstat.cfg files, from which the Wan calls are made..
then it establish SAs with the remote servers (assuming that all the IKE
process is correctly configured) and is pushing the system/vpn directory
and his files onto the remote servers through the newly established
IPSec connection (i suppose that there is a special process to do that
just once the SAs are built)

C/ the slave reads the system/vpn files, discover that he's really a
slave, all the informations about the master, the other slaves, etc...
builds his own csl.dat, ipwan.cfg and nlspstat.cfg files, and starts to
call at his turn (assuming that it's a full meshed network with calls on
both sides).. and everything goes up !

-> it seems that there is only the status.dat file which is sepcific to
the master.. all the other files are exactly replicated to each server
member of the S2S network.. if you have only one set of files up2date
and working fine, it becomes easy to make a copy on a broken server
(corrupted files, or not updated in my case), and to rebuild everything
from the scratch after having deleted the csl.*, nlspstat.cfg and
ipwan.cfg files of course.. also assuming that the values and attributes
in the tree are correct (you can always try to play with if you're
confident with what you're doing)

What i did -> i took the set of files from a slave with an up2date
member.dat and 3ptypol.dat file (also outdated in the master), copied
them in system/vpn of the master (after having made a backup of the
wrong files in case of..), restarted the master.. and everything worked
fine.. I did the same on the slave never updated and he started
immediately to call and be called

Now what would be nice is to have the structure of the .dat files to be
able to repair them with an hex editor in case of a problem populated
everywhere and no working copy available...

This is the kind of informations that i would like to have available
from Novell.. still no ATT by the way in Europe and confirmed by Novell US !

hope it will help in some situations... but take care of what you're
doing... it's not exactly the normal way..