We have recently patched up all of our cluster nodes (oes2sp2 x86_64).
Fully patched to 15Jun10 (i.e. including the much awaited ncs and nss fixes)

Since then we have had 3 incidents of nodes freezing.
In one instance we managed to get something out of /var/log/messages:
Jun 22 21:29:18 sblx-1 kernel: cma invoked oom-killer: gfp_mask=0x201d2, order=0
, oomkilladj=0
Jun 22 21:29:18 sblx-1 kernel:
Jun 22 21:29:18 sblx-1 kernel: Call Trace: <ffffffff8016538a>{oom_kill_process+9
Jun 22 21:29:18 sblx-1 kernel: <ffffffff8016596b>{out_of_memory+410} <fff
Jun 22 21:29:18 sblx-1 kernel: <ffffffff80169070>{__do_page_cache_readahe
ad+166} <ffffffff802f0cf5>{__wait_on_bit_lock+92}


So, it looks like a memory leak issue, as oom_killer is stepping and killing off processes...

Unfortunately this has varying degrees of failure.... In one instance the nodes resources were inaccessible, ssh was not possible and the console was frozen... However you could ping the node and the other nodes thought it was OK (cluster view) and still successfully running it's resources (cluster status). This took a power cycle to fail properly.

We have now logged an SR and have been advised a sysctl setting which should dump the task list should oom kick in again.... We await to see what might be causing the issue....

Has anyone else experienced any issues like this?

Regards and thanks