Help,

I have a new HP DL360G4 server with OES (NW6.5SP3) installed, connected by
fibre (FCA2210 - QL2300.HAM v6.80.07) to an HP MSA1000 disk array cabinet
(with total of 42 drives).

One physical disk in one of the 12-Disk RAID5 arrays failed overnight but
instead of seamlessly continuing (as a RAID5 array should), every Pool
deactivated and Volume dismounted on the server with device fail errors (as
shown below) *including* the Pool and Volume residing on the other defined
RAID5 array! and the server had to be rebooted in order for access to the
cabinet to be regained.

As would be expected, the RAID controller cabinet, when detecting a failed
drive, took one of the on-line hot spares and started auto rebuilding the
array using that disk, which duly completed in a few hours in the
background. (Mounting all volumes during this was fine once the server had
been rebooted)

I do not understand why the disk failure was detected at all by the server,
and why it also led to a catastrophic failure in accessing the whole
cabinet.

Is this a problem with NSS? Is it a problem in other NetWare drivers?
FCA2210 firmware? Is there some sort of problem with the MSA cabinet?
Doesn't appear to be the latter, as it seems to have behaved how one would
expect it to.

The MSA cabinet has latest firmware in both the embedded SAN switch
(v3.32.0a) and the MSA controller (V4.48).

NSS reports version V3.22 (build 994)


*** Begin syslog.err extract ***

5-09-2005 5:29:16 am: NWPA-3.20-0
Severity = 0
NWPA-004: The CDM driver deactivated device [V597-A3-D0:1] COMPAQ
MSA1000 VOLUME 500805F300160850 (4.48) due to a device failure.

5-09-2005 5:29:16 am: NWPA-3.20-0
Severity = 0
NWPA-004: The CDM driver deactivated device [V597-A3-D0:2] COMPAQ
MSA1000 VOLUME 500805F300160850 (4.48) due to a device failure.

5-09-2005 5:29:16 am: NWPA-3.20-0
Severity = 0
NWPA-004: The CDM driver deactivated device [V597-A3-D0:6] COMPAQ
MSA1000 VOLUME 500805F300160850 (4.48) due to a device failure.

5-09-2005 5:29:16 am: NWPA-3.20-0
Severity = 0
NWPA-004: The CDM driver deactivated device [V597-A3-D0:5] COMPAQ
MSA1000 VOLUME 500805F300160850 (4.48) due to a device failure.

5-09-2005 5:29:16 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:8] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:16 am: COMN-3.22-1092
Severity = 4 Locus = 3 Class = 0
NSS-3.00-5001: Pool ATLAS/PINK_RO is being deactivated.
An I/O error (18(zlssMSAP.c[1796])) at block 0(file block 0)(ZID 0) has
compromised pool integrity.

5-09-2005 5:29:16 am: COMN-3.22-33
Severity = 4 Locus = 3 Class = 0
NSS-2.70-5004: Volume ATLAS/APPS is being deactivated.
An I/O error (20204(zio.c[2179])) at block 19208898(file block
-19208898)(ZID 2191) has compromised volume integrity.

5-09-2005 5:29:16 am: COMN-3.22-1092
Severity = 4 Locus = 3 Class = 0
NSS-3.00-5001: Pool ATLAS/BLUE is being deactivated.
An I/O error (18(zlssMSAP.c[1796])) at block 0(file block 0)(ZID 0) has
compromised pool integrity.

5-09-2005 5:29:16 am: COMN-3.22-1092
Severity = 4 Locus = 3 Class = 0
NSS-3.00-5001: Pool ATLAS/PINK_USER is being deactivated.
An I/O error (20204(zio.c[2179])) at block 9604354(file block
-9604354)(ZID 2187) has compromised pool integrity.

5-09-2005 5:29:16 am: COMN-3.22-1092
Severity = 4 Locus = 3 Class = 0
NSS-3.00-5001: Pool ATLAS/PINK_APPS is being deactivated.
An I/O error (20204(zio.c[2179])) at block 19208898(file block
-19208898)(ZID 2191) has compromised pool integrity.

5-09-2005 5:29:16 am: COMN-3.22-33
Severity = 4 Locus = 3 Class = 0
NSS-2.70-5004: Volume ATLAS/USERS is being deactivated.
An I/O error (20204(zio.c[2179])) at block 9604354(file block
-9604354)(ZID 2187) has compromised volume integrity.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:1] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:2] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:6] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:5] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:3] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:4] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

5-09-2005 5:29:17 am: SERVER-5.70-1534
Severity = 4 Locus = 3 Class = 6
Device "[V597-A3-D0:7] COMPAQ MSA1000 VOLUME 500805F300160850 (4.48)"
deactivated by driver due to device failure.

*** End syslog.err extract ***