View RSS Feed


Self healing / alerting for DHCP services on OES

Rate this Entry
"If your like me" you hate when you get a call at 2am that the "whole network is down" because DHCP is broken.

Linux to the rescue... You can easily test if your OES DHCP server is responding and take some action. You can use any old Linux box you have lying around for monitoring... this can be the OES server hosting DHCP services, which also makes restarting dhcp services a lot easier.

1. Install dhcptools package. This is available for stock SLES, and hence OES. It provides dhcping which can interactively "ping" the DHCP server via, surprisingly enough, DHCP.

Once installed on the system you will use for monitoring, pick a local interface you will use to send / recieve the test packets. That interface should be on a network that can obtain leases.

Try with the interfaces you have, if dhcping returns a "Got... " message - it worked. You may need to add an interface to a network with DHCP service for this purpose if you cannot get dhcping to successfully test the DHCP server from the interfaces you already have. This use of dhcping uses unicast like a DHCP helper in a router would.


2. IF you want to get e-mail when DHCP fails, you'll need some sort of MTA on the box you use for monitoring. OES servers will not, by default, have one. The alternative is to use a script / tool to send mail which uses a remote MTA. Again, Google to the rescue. Either way, you need some command line way to send mail that works.

3. IF you want to stop / start DHCP, or take some other action, you may want to run the script ON the OES server. This way you can just restart the services. THis is useful if your dhcpsrvr croaks / hangs in a way that is fixed by restarting it.

Here is my alerting script.

if [[ $(dhcping -i -h 00:11:22:33:44:55 -s IP_OF_DHCP_SERVER -c IP_OF_THIS_INTERFACE ) == *Got* ]]; then
    echo "pass"
    # mail -s "DHCP Server Responding" you@your.domian <<< "DHCP Server Responding"
    echo "fail"
    mail -s "DHCP SERVER FAILED" you@your.domain <<< "Could Not Lease Address!  DHCP All Explodey?"
    mail -s "DHCP SERVER FAILED" your_sms_addr@sms.gateway <<< "Could Not Lease Address!  DHCP All Explodey?"
The -h and -c option should be selected from a local interface on the box you will run this script. use ifconfig to determine the proper values for whatever interface you want to use for testing.

You could also add a line to just restart DHCP, and hope for the best, after sending your alert. Something like:

rcnovell-dhcpd stop
killproc -p /var/lib/dhcp/var/run/dhcpd.pid -TERM /usr/sbin/dhcpd
rm /var/lib/dhcp/var/run/dhcpd.pid
rcnovell-dhcpd start

Or whatever you do to manually to revive it.

Then after testing your frankenscript, put it in your cron to run automatically. I would just rig it to alert via e-mail initially until you become confident it is not activating spuriously.

crontab -e

Then add something like:

*/5 * * * * /usr/local/sbin/dhcp_alert.sh >/dev/null 2>&1
Now every 5 minutes, the script runs and hopefully fixes a sick DHCP server.

Submit "Self healing / alerting for DHCP services on OES" to Twitter Submit "Self healing / alerting for DHCP services on OES" to Facebook Submit "Self healing / alerting for DHCP services on OES" to Google Submit "Self healing / alerting for DHCP services on OES" to Digg Submit "Self healing / alerting for DHCP services on OES" to del.icio.us Submit "Self healing / alerting for DHCP services on OES" to StumbleUpon