In my ongoing quest to have a reliable OES Linux cluster, I've set up a test environment with VMs that is similar to my production environment.

I have two OES SP2 nodes that operate normally with a shared iprint resource between them. To rehearse the rolling upgrade method to get these to OES2, I've followed some basic advice of Novell engineers at brainshare and taken one node down, reinstalled SLES10SP1+OES2 from scratch, and reintroduced the new system to the cluster. It worked great. I then downed the second server and did the same (erase/start from scratch on SLES10SP1+OES2). Again, seems good. This morning, I was trying to find why the nodes seemed to behave strangely when rolling the iprint resource from one node to the other. It would usually go comatose when trying to migrate, then after offlining manually and setting it back online to the node I originally tried migrating to in the first place, and it works. Ok, that's strange, but maybe OES2 is running slower in the VMs than the OES SP2 nodes were?

So the strangeness really comes in when I try to restart a node. I have tried turning the cluster to maintenance mode on, I've changed to run_parallel="no" in /etc/sysconfig/boot, running cluster down before trying to restart or shutdown, I've even tried stopping cluster services entirely on the node. Ultimately, soon after initiating shutdown -r now or halt, a line flashes by stating something along the lines of Kernel Panic and Abend. Within a second or less, I'm seeing the VM bios flash by and grub already starting up.

So far, one node appears to be totally corrupted as I see segmentation faults when it boots up, and eventually just scrolls dead all the way up the screen repeatedly without stopping. Even with only one node powered on, I still see the abend/kernel panic related to the cluster when shutting down or restarting that remaining node.

Does OES2 not allow a graceful shutdown when it is in a clustered environment?

Install steps:
-Install SLES10SP1 with OES2 add-on media, installing iprint, edir, and NCS
-I configure edir during the install but leave NCS unconfigured
-After install, I setup the iscsi initiator (no authentication) and enable the initiator to start automatically and also enable iscsi to start on bootup
-I update the server, restarting after each round of updates
-When no updates are available, I join the node to the cluster through yast/oes configuration.

This is all in a VMware workstation and Server environment, which appears to work normally with OES SP2. My iSCSI target is a SLES10SP1 on another server.