I had a drive fail in a DELL Poweredge 2950. After the second time it failed (reseating it got it back up for 3 days) I called warranty and they shipped me a replacement, with the instructions to swap it out while the server was running so the hardware would be aware that the drive had been replaced.

Things did not go well. I replaced the drive, as instructed, with the server up. I then went into the OS (SLES 10 SP 2, OES2 SP1) and looked at the system log. It showed that the drive was identified. (previous to this it had seen the failed drive removed) I then tried to follow the instructions to rebuild the NSS Software RAID 1 array. The first item, delete the partitions for the failed drive did not work. The failed drive has not been showing a partition in the nssmu utility when the drive is failed (or when it is removed). I tried to add the new device, and it did not show up as a partition with free space (I did not expect it to, but the documentation seemed to be making assumptions and skipping steps). I used iManage to go to devices and initialize the new drive, it put a small partition on it and the rest of the space now showed up as free space. I then went to RAID Devices in the NSSMU and expanded the array using the free space on the new device. The NSSMU said "Please Wait". Nothing happened with the drive lights.

Here is a link to the documentation;
13.14 Replacing a Failed Segment in a Software RAID

Novell Documentation

At that point things went bad. The volumes were still there, but no one could read or write. I waited quite a while, still nothing, so eventually I tried to shutdown the server shutdown -h now did not succeed, after 5 minutes I powered the system off.

On restart it came up but the NSS volumes on the RAID 1 array in question were in the same state, the directory showed, no reading, no writing. I powered off again (still unable to shutdown elegantly). Then I removed the new drive and everything came back OK. However only 1 drive in the array and although NSSMU shows only the one RAID segment that is functioning, the other two show up in iManage as missing_storage. I can't see a way to delete them from the array in iManage.

When I go into the SAS card's BIOS the new drive is there (it is #3 out of 4 in the list).

If you know how to make OES2 SP1 running on SLES 10 SP2 rebuild a software NSS array I would love to talk with you!

Craig Lyndes
Franklin Central SU