Gurus, we're looking at a condition on a couple of our DMZ servers where
they are getting excessive numbers of orphaned CLOSE_WAIT sessions that
never close. Over time, that chokes up the TCP stack and the servers become
non-responsive. We're trying to figure out why the application sometimes
doesn't properly close a session. This all started sometime last week we
think - not sure why, no Microsoft patches or reboots of the servers had
occurred before we had to reboot them Monday. Orphaned sessions are
occurring at the rate of 60 to 80 per day. At around the 1500 to -1800
level the server/service becomes choked. Happened to two other of our
application servers in the Sensitive Zone several months ago, when there
was some odd problem with Coxnet routing or something. Those miraculously
cured themselves after a couple of weeks. In the meantime, we have
discovered that running a simple disable/enable on the NIC, from the Device
Manager GUI, will restart the NIC and the TCP stack and clears the orphaned
connections. That whole process takes about 20-25 seconds. We were
wondering if it were possible to monitor this with NetIQ AppManager? Once
monitored, we could invoke the Action_DosCommand option to clear the
connections as the job action.