I have a basic question - resulting from a rather torturous troubleshooting
path - about the nature of the casa authentication system running on the
server. This system seems to provide for both password and Kerberos
USED AT THIS TIME? I especially wonder if it is used at all with an
eDirectory user source. Although, I see errors on the server relating to
this method when debug logging is turned on, are the errors I am seeing what
essentially cosmetic error on a server that is doing user authentication
against eDirectory?

I have posted two other times on this issue to this and the agent-deployment
forums. Although they aren't particularly common, other posts on the issues
I will outline here crop up often enough, and I have noticed the issue so
consistently in my management zone that I wonder if they aren't a bit more
common than folks are noticing.

THE PROBLEM - A little History of what I have done so far.

I started moving from Zen7 to ZCM 10 back in June. For the first several
months using the new system I spent most of my time "ironing out" policies,
bundles, imaging etc., and kind of figured there would be some things I
might have to put on a back burner until the time i was ready to move the
bulk of my machines to the new system. One of these issues had to do with a
10 to 90 second intermittent lag on login from my managed workstations
(Mostly XP SP3 with novell client 4.91 SP5). This lag has had no real
pattern and repeated logins from the same machine may or may not hit the 90
second lag. However the two Windows 7 machines I have always seemed to hit
the 90 second (and according to users - sometimes longer) lag. I started
looking into this problem in late August early September and found TID
327380 (Troubleshooting ZCM Agent login problems). Although I worked
through all the problems outlined there, none of the solutions fit my case.

When I first started looking at this I had a single primary server in my
zone, running ZCM 10.2.0 on SLES 10 managing about 100 workstations out of
the 600 or so that will eventually be managed. My user source was and still
is our eDirectory tree.

As part of the troubleshooting outlined in TID 327380 I turned on debug
logging on several workstations and my primary server.

Workstation results.
The logs produced on the workstations showed a few errors dealing with what
looked like certificate issues - sorry, but in all the messing around i have
done on this issue these original logs have been lost - but the
casaauthtoken.log showed 404 errors making an HTTP connection to the
server. However, using a browser to connect to
"https://yourServerDN:2645/CasaAuthTokenSvc/" worked. I also remember some
other errors that had to do with what looked like certificate issues.
However, as menioned above following the solutions provided in the TID(s)
did not help with the problem.

Server Results:
Debug logging on the server produced a repeated errors in the the
"/srv/www/casaata/logs/ats.log" indicating the absence of an /etc/CASA/
authtoken/svc/jaas.conf file, and in the
"/srv/www/casaata/logs.catalina.out" log indicating a certificate problem -
again the exact wording of the error has been lost....

At that time I made a post to the forums and got no response. I had also
not found any solutions to what I was seeing from a support search.
However, I was feeling pressure (self inflicted) to get moving to the new
system, but knew that an authentication problem like I was seeing would
never fly with my users (the workstations i had been testing were types of
machines where a problem like this could be better tolerated). Given the
situation as I perceived it I "got creative" - with admittedly limited
knowledge of what I was doing - and started messing with the
"novell-zenworks-configure" tool on the server.

Long story short, I quickly got things working MUCH worse. In fact I managed
to get to the point where Tomcat and casa wouldn't even load and I had
pretty much borked my system. This brought me to a manic period of backing
up the zone and database then trying a re-installation/disaster recovery of
the primary server - multiple times in an attempt to repair my zone. In
total I installed nearly 10 iterations of my primary server and in the
process converted from a single primary server configuration running on SLES
10 to dual primary servers (one on SLES10 and the other on SLES 11 with an
external database server - again SLES 11). In the process I had nearly lost
my database and at one point was ready to "just start over" but didn't want
to lose all the hard work I had done on bundles, policies and imaging.

Along about that time I made another post to this and the agent forum
venting my frustration at my largely self-inflicted pain entitled "OK-I AM
SICK AND TIRED OF THIS" (11/6/09). One good outcome of this was that it
attracted the attention of Jared Jennings who offered to give me some help
on the issue. While working with him I found that I had managed to miss
filling the required files list under the basic requirement for installing
ZCM, proving that no matter how clearly something can be stated in the
documentation someone will manage to miss-interpret it. Finding that
mistake I went through another round of server installations ending in the
configuration I now have - Two primary servers running ZCM 10.3 on SLES 11
with the database hosted on a third SLES 11 box.

After getting to this configuration I thought I had things solved - "Uh -
OK-I AM SIC (sic) AND TIRED OF THIS - (11/24/09). However I had just not
waited long enough when I posted that one. The same problem came back - 10
to 90 second waits etc. and Kerberos related errors on the server. This time
round, though I knew the servers have been correctly installed, and turning
on debug logging produced much better log files.

As a result of the better log files I was able to determine that the longer
lag times, when they occurred, DID have to do with certificates. In the
process of mangling my zone I had also renamed the primary servers - I did
this thinking that it might help in tracing certificate issues, figuring
that if a server associated with a certain certificate no longer existed
then problems associated with these stale certificates could be better
discovered and dealt with. As it turned out I did find an issue with
certain workstations showing attempted use of these "stale" certificates and
taking much longer to authenticate. That issue is now corrected, and as far
as I have found in the past few days, machines are logging in relatively
quickly - even the Windows 7 machines.

However the server(s) still show Kerberos related errors. Turning on debug
logging results in the following errors in the
/serv/www/casaats/logs/ats.log file:
"2009-12-30 10:45:05,051 WARN authtoksvc.Authenticate init()-
SecurityException accessing
Exception=java.lang.SecurityException: /etc/CASA/authtoken/svc/jaas.conf (No
such file or directory)
2009-12-29 11:51:17,645 DEBUG authtoksvc.AuthTokenConfig Constructor()- File
not found
2009-12-29 11:51:17,645 DEBUG authtoksvc.EnabledSvcsConfig Constructor()-
Exception accessing
Exception=java.lang.Exception: AuthTokenConfig()- File not found"
At this point no errors show up in the catalina.out log.
However, creating a jaas.conf file from the templates found in the
/etc/CASA/authtoken/svc/templates" directory and placing it in the
"/etc/CASA/authtoken/svc/" directory AND copying the authtoken.settings
file from "/etc/CASA/authtoken/svc/" to
results in the following errors in the ats.log:
"2009-12-30 11:05:11,333 DEBUG authtoksvc.Krb5Authenticate Constructor()-
GSS Exception caught: No valid credentials provided (Mechanism level:
Attempt to obtain new ACCEPT credentials failed!)
2009-12-30 11:05:11,334 WARN authtoksvc.Authenticate init()- Exception
instantiating mechConfig or mechanism
Exception=java.lang.Exception: Failed to instantiate needed GSS objects"
AND the following from the catalina.out log:
"Debug is true storeKey true useTicketCache true useKeyTab true doNotPrompt
true ticketCache is /var/lib/CASA/authtoken/svc/ticket.cache isInitiator
true KeyTab is /etc/krb5.keytab refreshKrb5Config is false principal is
host/zen1.chem.ku.edu tryFirstPass is false useFirstPass is false storePass
is false clearPass is false
Acquire TGT from Cache
Principal is host/zen1.chem.ku.edu
null credentials from Ticket Cache
Key for the principal host/zen1.chem.ku.edu not available in
[Krb5LoginModule] authentication failed
Unable to obtain password from user"
There is no /etc/krb5.keytab file.
However, login times on the main workstation I am testing with have
consistently fallen to a pause of about 5 seconds - over having no zenworks
installed on the machine. It looks to me like things are working.

USED AT THIS TIME? I especially wonder if it is used at on with edirectory?
Have I simply been seeing what eseentially amounts to a cosmetic error on a
server that is doing user authentication against eDirectory?