Hi,
I have done some research on a bug affecting the 3ware 9550SXU RAID
controller when running the current SLES10 sp2 and the XEN 3.2 hypervisor,
and I want to give the community the benefit of my experience - such as it
is.
The bug causes the system to remount the drives read-only. An error will
show up in /var/log/messages that looks like this:
kernel: PCI-DMA: Out of SW-IOMMU space for 16384 bytes at device
0000:06:01.0
kernel: 3w-9xxx: scsi1: ERROR: (0x06:0x001C):Failed to map scatter gather
list.
The error occurs when performing large amounts of i/o (read and write) on
the local drives, such as copying large file backed disk images.
Novell, IBM and others on this forum (see
http://forums.novell.com/novell-prod...ml#post1458810)
have suggested that the fix for this is to increase the size of the
swiotlb, but this only delays the inevitable.
Both the XEN and Redhat devs have bugzilla entries for this issue, and
Novell has a bugzilla entry for OpenSUSE 10.3. The Redhat devs have also
posted a patch and are distributing the patch in an update to RHEL5.
I posted a bug report via the Novell bugzilla system on July 17, but it is
not showing up in the system yet...
Here are some references for your reading pleasure.
Novell Knowledgebase, TID 7000060 which points you to find the solution in
TID 3692489.
Novell Bugzilla bug 333658 : lots of "Out of SW-IOMMU space" log messages
from SATA disks with XEN (opened against openSUSE 10.3)
IBM Support, IBM RAID card with SLES10sp2 and XEN:
http://www-304.ibm.com/systems/suppo...andind=5000008
XENSource Bug 1227 :
http://bugzilla.xensource.com/bugzil...ug.cgi?id=1227
Redhat Bug 433554 with patch code :
https://bugzilla.redhat.com/show_bug.cgi?id=433554
Redhat Patch for RHEL5 from March 14 2008 :
http://rhn.redhat.com/errata/RHBA-2008-0314.html
Hope that this helps those of you that are running into this and wondering
what is going on. All we have to do now is wait for Novell engineers to
integrate the patch into a sles10sp2 update.
Cheers,
Ron