Good day all,

I am interested in hearing what techniques you are using (if you are)
for your Operators to keep up with systems being placed into Ad-hoc
Maintenance Mode.

Two days after system is brought back online after one week remediation
work, a critical Incident is missed because system is still in MM. SA
calls Operations asking if Server XXX can be placed into Maintenance
Mode but doesn't know how long remediation work is going to take. Not a
problem for Operations if they know an end date and can leverage
Scheduled Maintenance. In lieu of having good ticketing and/or processes
in place to track this sort of thing... night shift forgets to let
anyone on days know that it was placed into MM. SA is just glad to be
done so they don't update Operations that they are done.

Possible Solutions/Thoughts:
I have experimented a bit in the past with letting Operations put
free-form text into a Custom Property called "MM_Notes"... things like
"Sam requested MM 5/3 1600", or "eMail request from SA 4/29... leve off
until 5/29", "Awaiting D'Comm on 6/2 - see Mary". While these notes have
proven very helpful to Operators they do tend to clutter up things and
don't always get cleaned up well.

I have often thought it would be really nice for a simple dialog box to
be brought up that asked for some quick text to be inserted into a field
in Nqccdb that would allow Ops to reference when in doubt. I have also
tried to think of unique ways to use the Membership Rules of MM Status
to drive some other monitoring methodology. Perhaps I am just missing a
more obvious effective method beyond simply instructing my Operations
Staff to just communicate better and periodicially check a Filtered
Server View for MM Status and remove any that you know have been fixed.
That last one is what we all probably already are doing.

I am anxious to hear your suggestions.



armstrongge's Profile:
View this thread: