I applied all the oes patches to my linux test server and they crashed over the weekend.

They claim that the issue is it lost communication, but I don't see any link down messages. I had, at one time, increased the time outs, but that turns out to make things more unstable. Anyone know what the "re-mirror messages mean?"




Aug 24 07:43:17 gwtest1-h361-l102 kernel: CLUSTER-<WARNING>-<6077>: The cluster has lost communication with node [gwtest1-h361-l101].
Aug 24 07:43:17 gwtest1-h361-l102 kernel: Node [gwtest1-h361-l101] may have failed or experiencing other problems.
Aug 24 07:43:17 gwtest1-h361-l102 kernel: To ensure cluster stability, this node has sent a poison pill to node [gwtest1-h361-l101].
Aug 24 07:43:17 gwtest1-h361-l102 kernel: Epoch for this node is higher than for some other node.
Aug 24 07:43:17 gwtest1-h361-l102 kernel: Other node is slow to update epoch and bitmask (slow or dead).
Aug 24 07:43:47 gwtest1-h361-l102 kernel: MM_RemirrorPartition: GWTest-Cluster1.sbd, 1
Aug 24 07:43:47 gwtest1-h361-l102 kernel: device-mapper: Target type does not support messages
Aug 24 07:43:47 gwtest1-h361-l102 kernel: device-mapper: Unrecognised multipath message received.
Aug 24 07:48:53 gwtest1-h361-l102 kernel: MM_RemirrorPartition: GWTest-Cluster1.sbd, 1
Aug 24 07:48:53 gwtest1-h361-l102 kernel: device-mapper: Target type does not support messages
Aug 24 07:48:53 gwtest1-h361-l102 kernel: device-mapper: Unrecognised multipath message received.