I'm seeing LDAPConnection methods hang sometimes when run against Fedora
Core DS 1.0.4. I'm running the latest jldap from CVS. Doing kill -3 to
get the stack trace:

com.novell.ldap.Connection.acquireWriteSemaphore(C onnection.java:285)
- locked <0x26ad1c18> (a java.lang.Object)
at com.novell.ldap.Connection.shutdown(Connection.jav a:948)
at com.novell.ldap.Connection.destroyClone(Connection .java:590)
- locked <0x26ad1ab8> (a com.novell.ldap.Connection)
at com.novell.ldap.LDAPConnection.disconnect(LDAPConn ection.java:2344)
at com.novell.ldap.LDAPConnection.disconnect(LDAPConn ection.java:2304)

Sometimes this stack trace is hung in a bind instead of a disconnect.
Either way, this happens right after the following null pointer exception
is printed to standard error.

Exception in thread "Thread-7845" java.lang.NullPointerException
at com.novell.ldap.Message.putReply(Message.java:328)
at com.novell.ldap.Connection$ReaderThread.run(Connec tion.java:1295)
at java.lang.Thread.run(Thread.java:613)

Basically, the null pointer in the Connection's reader thread results in
the semaphore for the Connection class not getting freed. As a result, the
next call to acquireWriteSemaphore just hangs.

If you follow the Null Pointer, you'll see it is because the conn member in
Message.class is null. The only way this can happen is if a null
Connection is passed in during construction of the Message object *highly
unlikely*, or if Message::cleanup had already been called for the class.

There does appear to be a timing problem in the code. In lines 504-506 of
Message::abandon we have:

LDAPMessage msg = new LDAPAbandonRequest( msgId, cont);
// Send abandon message to server
conn.writeMessage( msg);

Later in the method in line 537 we have:

cleanup is the only method in the Connection class which sets the conn
member to null. It is also only called in the finalizer and from

So it would seem that the following scenario would cause the null pointer

1) Message::abandon is called
2) LDAPAbandonRequest is sent to server
3) cleanup is called, setting conn to null
4) Reply to LDAPAbandonRequest is received from server, calling
5) Message:utReply null pointers since conn object is null and doesn't
free the semaphore

It seems that the code assumes that 4 happens before 3. However, if the
LDAP server is under enough load, 3 could happen before 4.

Indeed, in my environment, if I simply comment out setting conn to null in
Message::cleanup() everything works. I am not sure that this is an
acceptable solution though. Is there a better way to fix this issue?

I'd appreciate any help or suggestions. If it is useful, I can post a test
program which causes the problem. Basically it just fires up 30 threads
each of which do the following in a continuous loop: new LDAPConnection,
connect, bind, disconnect. One thread inevitably hangs after a few
minutes. Thanks.