[I'm sorry for the lenght of my post]

== Scenario:
- three server HP Proliant DL380 [with Smart-Array 6i HBA]
- *only* Linux OES SP1 boxes [no NetWare box]

- NCS Cluster [2-nodes] based on 1.8.1-70 [node names are ncs-node1
and ncs-node2]
- Third server is NCP server [servername is opt] DNS/DHCP and email
server

- opt is the first server installed in the tree
- opt contains master replica of [root] partition [there are no other
partition]
- node1 and node2 containg a R/W replica

== Troubles with OES SP1
With OES SP1, we have verified a *pleteora* of namcd issues [when user
logging into sshd with username typed in mixed letters [or in
uppercase] namcd cause SIG FAULT, and anyone can authenticate; just
read TID TID10099224 [where is exposed a similar scenario] but I have
doubled checked and this is not my case. With OES SP1, no particular
issue about NCS, that works nicely [before upgrade to SP2]

See below about namcd issue (/var/log/messages) ...
[...]
Dec 1 06:40:42 ncs-node1 -- MARK --
Dec 1 07:20:43 ncs-node1 -- MARK --
Dec 1 07:40:43 ncs-node1 -- MARK --
Dec 1 08:20:43 ncs-node1 -- MARK --
Dec 1 08:40:43 ncs-node1 -- MARK --
Dec 1 08:47:17 ncs-node1 sshd[31354]: Accepted
keyboard-interactive/pam for acanclini from ::ffff:172.16.42.173 port
1050 ssh2
Dec 1 08:47:55 ncs-node1 sshd[31570]: Accepted
keyboard-interactive/pam for root from ::ffff:172.17.3.160 port 1652
ssh2
Dec 1 08:48:48 ncs-node1 sshd[31990]: nds_nss_GetPwdbyName: init sock
returned 0
Dec 1 08:48:48 ncs-node1 sshd[31990]: Illegal user alsmersi from
::ffff: 172.16.42.157
Dec 1 08:48:49 ncs-node1 /usr/sbin/namcd[25626]: Deleted hash tables
and flushed data into local files
Dec 1 08:48:49 ncs-node1 /usr/sbin/namcd[25626]: Deinitialized
threads
Dec 1 08:48:51 ncs-node1 namcd: SIGTTOU caught
Dec 1 08:48:51 ncs-node1 namcd: SIGTTIN caught
Dec 1 08:48:51 ncs-node1 namcd: SIGTSTP caught
1 08:48:51 ncs-node1 namcd: SIGTSTP caught
Dec 1 08:48:51 ncs-node1 /usr/sbin/namcd[32016]: Starting namcd..
Dec 1 08:48:51 ncs-node1 /usr/sbin/namcd[32016]: namcd populating the
user hash tables
Dec 1 08:48:51 ncs-node1 /usr/sbin/namcd[32016]: namcd populating
group hash tables
Dec 1 08:48:51 ncs-node1 /usr/sbin/namcd[32016]: namcd Populated hash
tables
Dec 1 08:48:51 ncs-node1 /usr/sbin/namcd[32016]: Created all the
threads
Dec 1 08:49:04 ncs-node1 sshd[32077]: _nds_loginUser(): ldap compare
of user password failed
Dec 1 08:49:10 ncs-node1 sshd[32077]: nds_authenticate():ldap_compare
failed with crypted password
Dec 1 08:49:10 ncs-node1 sshd[32077]: PAM_NAM : NDS Login failed
Dec 1 08:49:10 ncs-node1 sshd[31990]: error: PAM: Authentication
failure
Dec 1 08:49:10 ncs-node1 sshd[31990]: Failed keyboard-interactive/pam
for illegal user mitrabucchi from ::ffff:172.16.42.157 port 1056 ssh2
Dec 1 08:49:26 ncs-node1 sshd[32214]: _nds_loginUser(): ldap compare
of user password failed
[...]
same message for all users that trying to authenticate in SSH2

After stopping [manually] and restarting [by hand] namcd daemon,
sometimes it works nicely for 14/16 hours [nam.conf is configured in
default manner] and users can authenticate without any problems.

== Group with long names (and underscore characters)
In eDirectoty tree I have created different group with Uppercase and
with underscore [like Linux_Enabled or FaxWare_Valid_Users]; now I
will try to rename these group with naming in lowercase and different
length. I have the suspect (very reproducible) that namcd gives SIG
caught *when* user type a wrong usernane (mixed case) and *when* this
user is member of a group that have a long name.

== Authentication with Novell Client32
No problem at all during authentication to eDirectory with Novell
Client32, also using mixed case on username login.

== Troubles only with SSH
The troubles comes on *only* using SSH login [ via SecureNetterm].
Obviously all users are LUM-enabled


==== OES SP2 History =====

== After that I have applied OES SP2 [D11]
In the past days [1-Dec-2005] I have discovered and downloaded [from
My Download Area, three CD (OES SP2 CD1, OES SP2 CD2, OES SP2 CD4
(SLES9))] latest version of OES for Linux [SP2 D11 version]

I have installed OES SP2 on four testing machine without any apparent
problem.

Same upgrade to HP Proliant DL380 in production is a nightmare for me.

Single box is updated without any apparent issues; but after reboot,
I'm unable to launch iManager; the error message is:

== iManager doesnt work on opt server
HTTP Status 503 - Servlet portal is currently unavailable
--------------------------------------------------------------------------------
type Status report
message Servlet portal is currently unavailable
description The requested service (Servlet portal is currently
unavailable) is not currently available.
--------------------------------------------------------------------------------
Apache Tomcat/4.1.31

I'm believe that SP2 has overwritten som nps files; I will investigate
about this issue.

== After update, a pleteora of XSrvCChannel connection error

Please note below my /var/log/messages
[...]
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::connectSocket-
Connection creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::init- Connection
creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -getCChannel- Channel
Initialization failed for socket
/var/opt/novell/xtier/xsrvd/srv-socket-0
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -SubmitReq- Channel
unavailable
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -SubmitReq- Will attempt
to retry RPC, count = 1
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::connectSocket-
Connection creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::init- Connection
creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -getCChannel- Channel
Initialization failed for socket
/var/opt/novell/xtier/xsrvd/srv-socket-0
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -SubmitReq- Channel
unavailable
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -SubmitReq- Will attempt
to retry RPC, count = 2
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::connectSocket-
Connection creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::init- Connection
creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -getCChannel- Channel
Initialization failed for socket
/var/opt/novell/xtier/xsrvd/srv-socket-0
[...]
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::connectSocket-
Connection creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: XSrvCChannel::init- Connection
creation failed, error = 111
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -getCChannel- Channel
Initializa tion failed for socket
/var/opt/novell/xtier/xsrvd/srv-socket-9
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -SubmitReq- Channel
unavailable
Dec 7 13:56:54 opt httpd2-prefork: IPCCLNT -SubmitReq- Will attempt
to retry RPC, count = 3
Dec 7 13:56:55 opt su: (to novlwww) root on /dev/pts/0
Dec 7 13:56:55 opt su: pam_unix2: session started for user novlwww,
service su
Dec 7 13:56:56 opt su: pam_unix2: session finished for user novlwww,
service su
Dec 7 13:56:56 opt kernel:
COMN::/usr/src/packages/BUILD/kernel-bigsmp-2.6.
5/modules-2.6.5/nss/comn/comnLKM.c[201]
Dec 7 13:56:56 opt kernel: MaxBuffer_s = 1413120
Dec 7 13:56:56 opt kernel: NumPagesToWait = 77788
Dec 7 13:56:56 opt kernel:
COMN::/usr/src/packages/BUILD/kernel-bigsmp-2.6.
5/modules-2.6.5/nss/comn/comnLKM.c[204]
Dec 7 13:56:56 opt kernel:
ZLSS::/usr/src/packages/BUILD/kernel-bigsmp-2.6.
5/modules-2.6.5/nss/zlss/zlssLKM.c[222]
Dec 7 13:56:56 opt kernel:
ZLSS::/usr/src/packages/BUILD/kernel-bigsmp-2.6.
5/modules-2.6.5/nss/zlss/zlssLKM.c[224]
Dec 7 13:56:56 opt kernel:
MANAGE::/usr/src/packages/BUILD/kernel-bigsmp-2.
6.5/modules-2.6.5/nss/manage/manageLKM.c[226]
Dec 7 13:56:56 opt kernel:
MANAGE::/usr/src/packages/BUILD/kernel-bigsmp-2.
6.5/modules-2.6.5/nss/manage/manageLKM.c[228]
Dec 7 13:56:56 opt kernel: Opening trustee file:
/opt/novell/nss/conf/trustees.xml
Dec 7 13:56:56 opt kernel:
LSA::/usr/src/packages/BUILD/kernel-bigsmp-2.6.5
/modules-2.6.5/nss/lsa/lsaLKM.c[456]
Dec 7 13:56:56 opt kernel:
LSA::/usr/src/packages/BUILD/kernel-bigsmp-2.6.5
/modules-2.6.5/nss/lsa/lsaLKM.c[459]
Dec 7 13:56:57 opt kernel: NSSLOG ==> [MSAP]
/usr/src/packages/BUILD/kernel
-bigsmp-2.6.5/modules-2.6.5/nss/comn/comnLog.c[201]
Dec 7 13:56:57 opt kernel: Pool "POOL_opt" - MSAP activate.
Dec 7 13:56:57 opt kernel:
Server(8e8becdc-5360-11da-a5-b3-0013216b51e
3) Cluster(00000000-0000-0000-00-00-000000000000)
Dec 7 13:56:57 opt kernel: NSSLOG ==> [MSAP]
/usr/src/packages/BUILD/kernel
-bigsmp-2.6.5/modules-2.6.5/nss/comn/comnLog.c[201]

and a new error about NSS [?]
[...]
Dec 7 13:56:58 opt adminfs daemon: adminusd: Starting
[...]
Dec 7 13:57:37 opt adminfs daemon: adminusd: Error reading from the
admin file service device
Dec 7 13:57:37 opt kernel: NSS error out of range for translation.
Error=987123


After reboot I'm able to login and authenticate [with Client 32] on
opt box just updated to SP2; NSS volume is mounted and I'm able to
access file retained.

== Updating to SP2 on first node of cluster
After opt server, I have updated also ncs-node1 server [ncs-node2 is
offline] but, after update and reboot [no apparent troubles during
update], when cluster start [eDirectory and LDAP server mount nicely],
server completely freezes [ouch!] and I must to power-off. No input
available on keyboard. Server is totally locked, unusable [ouch!]

After a bit I have decided to stop my upgrade to SP2 operation. Noew
ncs-node1 is power-off first; just re-activated ncs-node2 [OES SP1]
without any apparent problem.

== Extract from ncs-node1 /var/log/messages after reboot
[...]
Dec 7 12:30:15 ncs-node1 kernel: adminfsdrv: module not supported by
Novell, setting U taint flag.
Dec 7 12:30:15 ncs-node1 xinetd[9526]: Reading included configuration
file: /etc/xinetd.d/rsh [file=/etc/xinetd.d/rsh] [line=24]
Dec 7 12:30:15 ncs-node1 xinetd[9526]: Reading included configuration
file: /etc/xinetd.d/rstatd [file=/etc/xinetd.d/rstatd] [line=22]
Dec 7 12:30:15 ncs-node1 xinetd[9526]: Reading included configuration
file:
/etc/xinetd.d/rsync [file=/etc/xinetd.d/rsync] [line=16]
Dec 7 12:30:15 ncs-node1 novell-xregd[9532]: XTRegEng -RegInitialize-
Open database failure, error = 81052101
Dec 7 12:30:15 ncs-node1 novell-xregd[9532]: XRegD -InitDbObjHolders-
Exception caught instantiating DbObjHolder
Dec 7 12:30:15 ncs-node1 novell-xregd[9532]: XTRegEng -RegInitialize-
Opendatabase failure, error = 00000000
Dec 7 12:30:15 ncs-node1 xinetd[9526]: Reading included configuration
file: /etc/xinetd.d/servers [file=/etc/xinetd.d/servers] [line=12]
Dec 7 12:30:15 ncs-node1 xinetd[9526]: Reading included configuration
file: /etc/xinetd.d/services [file=/etc/xinetd.d/services] [line=13]
Dec 7 12:30:15 ncs-node1 xinetd[9526]: Reading included configuration
file: /etc/xinetd.d/swat [file=/etc/xinetd.d/swat] [line=13]
Dec 7 12:30:15 ncs-node1 kernel: adminfs: module not supported by
Novell, setting U taint flag.
Dec 7 12:30:15 ncs-node1 kernel: adminfs: no version for
"adrv_request" found: kernel tainted.
Dec 7 12:30:15 ncs-node1 kernel: adminfs init
Dec 7 12:30:15 ncs-node1 adminfs daemon: adminfsd: Starting
Dec 7 12:30:16 ncs-node1 kernel: gipc: module license 'Proprietary'
taints kernel.
Dec 7 12:30:16 ncs-node1 kernel: sbd: module not supported by Novell,
setting U taint flag.
Dec 7 12:30:16 ncs-node1 kernel: sbd: module license 'Proprietary'
taints kernel.
Dec 7 12:30:16 ncs-node1 kernel: vipx: module not supported by
Novell, setting U taint flag.
Dec 7 12:30:16 ncs-node1 kernel: vipx: module license 'Proprietary'
taints kernel.
Dec 7 12:30:16 ncs-node1 kernel: css: module not supported by Novell,
setting U taint flag.
Dec 7 12:30:16 ncs-node1 kernel: css: module license 'Proprietary'
taints kernel.
Dec 7 12:30:17 ncs-node1 kernel: cvb: module not supported by Novell,
setting U taint flag.
Dec 7 12:30:17 ncs-node1 kernel: cvb: module license 'Proprietary'
taints kernel.
Dec 7 12:30:17 ncs-node1 kernel: crm: module not supported by Novell,
setting U taint flag.
Dec 7 12:30:17 ncs-node1 kernel: crm: module license 'Proprietary'
taints kernel.
Dec 7 12:30:17 ncs-node1 kernel: cmsg: module not supported by
Novell, setting U taint flag.
Dec 7 12:30:17 ncs-node1 kernel: Starting Novell Cluster Services
Dec 7 12:30:17 ncs-node1 kernel: Start(clstrlib)
Dec 7 12:30:17 ncs-node1 kernel: nCSClusterName = ncs1
Dec 7 12:30:17 ncs-node1 kernel: nCSRevision = 282
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeNames[0] = ncs-node1
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeIPAddresses[0] = 0b2a10ac
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeNames[1] = ncs-node2
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeIPAddresses[1] = 0c2a10ac
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeNumber = 0
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeBitMask = 00000003
Dec 7 12:30:17 ncs-node1 kernel: nCSNumberOfNodes = 2
Dec 7 12:30:17 ncs-node1 kernel: nCSMyIpAddress = 0b2a10ac
Dec 7 12:30:17 ncs-node1 kernel: nCSGuid =
e019b753-6253-da01-8024-000000000000
Dec 7 12:30:17 ncs-node1 kernel: nCSNodeIsolationScript = "panic"
Dec 7 12:30:17 ncs-node1 kernel: nCSPortNumber = 7023
Dec 7 12:30:17 ncs-node1 kernel: nCSGIPCHeartbeat = 1
Dec 7 12:30:17 ncs-node1 kernel: nCSGIPCTolerance = 8
Dec 7 12:30:17 ncs-node1 kernel: nCSCRMQuorum = 2
Dec 7 12:30:17 ncs-node1 kernel: nCSCRMQuorumTimeout = 60
Dec 7 12:30:17 ncs-node1 kernel: nCSGIPCMasterWatchdog = 1
Dec 7 12:30:17 ncs-node1 kernel: nCSGIPCSlaveWatchdog = 8
Dec 7 12:30:17 ncs-node1 kernel: nCSGIPCMaxRetransmits = 30
Dec 7 12:30:17 ncs-node1 kernel: Start(vll)
Dec 7 12:30:17 ncs-node1 kernel: Start(gipc)
Dec 7 12:30:17 ncs-node1 kernel: Start(sbdlib)
Dec 7 12:30:17 ncs-node1 kernel: Start(sbd)
Dec 7 12:30:17 ncs-node1 kernel: Start(vipx)
Dec 7 12:30:17 ncs-node1 kernel: Start(css)
Dec 7 12:30:17 ncs-node1 kernel: Start(crm)
Dec 7 12:30:17 ncs-node1 kernel: CLUSTER RESOURCE SCREEN
Dec 7 12:30:17 ncs-node1 kernel: Start(cvb)
Dec 7 12:30:17 ncs-node1 kernel: Start(cmsg)
Dec 7 12:30:17 ncs-node1 kernel: Start(cma)
Dec 7 12:30:17 ncs-node1 kernel: gipc using eth0: addr=0b2a10ac,
mask=00ffffff
Dec 7 12:30:17 ncs-node1 kernel: Novell Cluster Services Started
Dec 7 12:30:17 ncs-node1 ncs-configd: PID=9707
Dec 7 12:30:17 ncs-node1 ncs-resourced: PID=9710
Dec 7 12:30:17 ncs-node1 ncs-emaild[9713]: Starting...
Dec 7 12:30:17 ncs-node1 ncs-emaild[9713]: Started
Dec 7 12:30:17 ncs-node1 kernel: Hangcheck: starting hangcheck timer
0.9.0 (tick is 1 seconds, margin is 8 seconds).
Dec 7 12:30:17 ncs-node1 kernel: Hangcheck: Using monotonic_clock().
Dec 7 12:30:20 ncs-node1 kernel: adminfs: Error 21702 from the write
function
Dec 7 12:30:20 ncs-node1 kernel: smszapi: module not supported by
Novell, setting U taint flag.
Dec 7 12:30:20 ncs-node1 kernel: smszapi major number is 251
Dec 7 12:30:21 ncs-node1 novell-xsrvd-0[9958]: XSrvD
-ServiceConnections- Unable to bind socket, error = 13
Dec 7 12:30:21 ncs-node1 novell-xsrvd-3[9962]: XSrvD
-ServiceConnections- Unable to bind socket, error = 13
Dec 7 12:30:21 ncs-node1 novell-xsrvd-2[9961]: XSrvD
-ServiceConnections- Unable to bind socket, error = 13
Dec 7 12:30:21 ncs-node1 novell-xsrvd-4[9964]: XSrvD
-ServiceConnections- Unable to bind socket, error = 13

after that, server freezes totaly
[...]

== Situation
At the moment I have:
- opt box updated to OES SP2 with troubles with NSS and iManager
- ncs-node1 box updated to OES SP2 that freezes when NCS starts; I
have just removed NCS loading with cluster chkconfig novell-ncs off
- ncs-node2 box up and running [with cluster resources loaded]; it
works nicely


== iManager
- Please note that I dont have any issue running iManager from
ncs-node1 box [updated to SP2] and from ncs-node2 [SP1]
- I'm able to run Cluster Manager

== Kernel version are:
- with OES SP1 [kernel 2.6.5-7.195-bigsmp]
- with OES SP2 [kernel 2.6.5-7.234-bigsmp]

== Final question
- In message spash screen after update to SP2, I can read that OES SP2
is based on SLES SP3; it's this message correct? I'm a bit confused
about this message.


Any suggest is very appreciate
Alex/