Hi folks,

(This post was originally meant to be a rant and a request for help, but
while writing the final paragraph i found the solution. It's still a
rant, but i figured i'd post my solution here in case someone else runs
into the same issue.)

I've just spent several hours banging my head against a broken cluster
node. My system is a 32-bit SLES 10 VM running on VMware ESX 3.5.x.

I upgraded from SLES10 SP3 and OES2 SP2 to the next service packs for
each (using the move-to-oes-sp3 script in yast2 online_update).
Everything went well for the first few update/reboot sequences, then
after the final reboot on SLES10 SP4 & OES2 SP3, cluster services would
not load or join the cluster on restart.

I checked dmesg and found errors about "Loading module compiled for
kernel version" into a previous kernel version, so
i tried downgrading to that kernel version, only to find that it was
older than the one i had just upgraded from (it's the original SLES10
SP3 kernel). So i tried upgrading back to the same kernel which is
running on the other cluster node (, but that did
not work any better.

I have to say that i'm not impressed that OES2 SP3 isn't even compiled
against the appropriate kernel, and because of SUSE's kernel RPM
overwrite policy there's no way i can select to boot from a previous
kernel to see if that fixes things. Note to SUSE and other distro
builders: if you're not doing kernel package upgrades like Red Hat or
Ubuntu (so that we can select to boot from the previous kernel from the
boot menu), you're doing it *WRONG*.

I then upgraded again to the latest recommended kernel for SLES10 SP4,
and still no joy. Dmesg shows this error before the rot starts:

allocation failed: out of vmalloc space - use vmalloc=<size> to increase

When searching for this error i stumbled across
which pointed me to

Adding vmalloc=192M to /boot/grub/menu.lst and rebooting solved the
problem for me.