PDA

View Full Version : Near catastrophe



Stevo
25-Jun-2007, 10:33 PM
So, I'm out for the evening last Sat nite. Fairly early on (9:30) & I
have no vehicle (it's at home), I get a call from the S.O. dispatchers
here, cannot connect to inquire on license plates.

Like I said, I have no vehicle, so I call my telecommuter co-worker,
have him check what's going on. He calls back, he can vpn in, but
cannot connect to our 400. Hrmm.......... He calls a local co-worker
to go check it out.

I get a call around 30 minutes later. HOLY @#$%ING #$%@!!! It was
almost 120F in our server room. Dispatch could not connect because our
400 shut down. Both chillers had quit earlier in the evening, around
8-8:30.

By the time I got there, the A/C guy was there getting the chillers
running, and we seem to have had no hardware failure (knocking on wood).

Apparently, the phone line that our monitoring device uses got removed
w/o us being made aware of it.

So, first thing this morning was used to get the monitoring box
connected to a line & able to call when necessary.

WHEW!

Danita
25-Jun-2007, 10:59 PM
Stevo wrote:

> I get a call around 30 minutes later. HOLY @#$%ING #$%@!!! It was
> almost 120F in our server room.

It's almost that hot in our house <g> - we got home to an AC that isn't
working. I'm hoping that it just has a vapor lock or something and needs to
"rest" - but by tomorrow a.m. if I can't get it to work I'll have to phone
someone. Honestly - we're only about 89 here though.

--
Danita

Stevo
25-Jun-2007, 11:02 PM
Danita scribbled something like:

> It's almost that hot in our house <g> - we got home to an AC that
> isn't working. I'm hoping that it just has a vapor lock or something
> and needs to "rest" - but by tomorrow a.m. if I can't get it to work
> I'll have to phone someone. Honestly - we're only about 89 here
> though.

Bad thing is, it wasn't even *really* that hot here over the weekend.
90+, but that's hot enough for me. <G>

Elrey
25-Jun-2007, 11:04 PM
Makes ya wonder though... how much of your over all server life was
shortened due to the heat?

I think you got another 2 weeks before hard drive failures and memory I/O
errors pop up... hehe

"Stevo" <steveSPAM@LESSccgov.net> wrote in message
news:iwWfi.1714$Rg7.585@prv-forum2.provo.novell.com...
> So, I'm out for the evening last Sat nite. Fairly early on (9:30) & I
> have no vehicle (it's at home), I get a call from the S.O. dispatchers
> here, cannot connect to inquire on license plates.
>
> Like I said, I have no vehicle, so I call my telecommuter co-worker,
> have him check what's going on. He calls back, he can vpn in, but
> cannot connect to our 400. Hrmm.......... He calls a local co-worker
> to go check it out.
>
> I get a call around 30 minutes later. HOLY @#$%ING #$%@!!! It was
> almost 120F in our server room. Dispatch could not connect because our
> 400 shut down. Both chillers had quit earlier in the evening, around
> 8-8:30.
>
> By the time I got there, the A/C guy was there getting the chillers
> running, and we seem to have had no hardware failure (knocking on wood).
>
> Apparently, the phone line that our monitoring device uses got removed
> w/o us being made aware of it.
>
> So, first thing this morning was used to get the monitoring box
> connected to a line & able to call when necessary.
>
> WHEW!

Patrick Farrell
25-Jun-2007, 11:09 PM
Elrey wrote:
> Makes ya wonder though... how much of your over all server life was
> shortened due to the heat?
>
> I think you got another 2 weeks before hard drive failures and memory I/O
> errors pop up... hehe
>
> "Stevo" <steveSPAM@LESSccgov.net> wrote in message
> news:iwWfi.1714$Rg7.585@prv-forum2.provo.novell.com...
>> So, I'm out for the evening last Sat nite. Fairly early on (9:30) & I
>> have no vehicle (it's at home), I get a call from the S.O. dispatchers
>> here, cannot connect to inquire on license plates.
>>
>> Like I said, I have no vehicle, so I call my telecommuter co-worker,
>> have him check what's going on. He calls back, he can vpn in, but
>> cannot connect to our 400. Hrmm.......... He calls a local co-worker
>> to go check it out.
>>
>> I get a call around 30 minutes later. HOLY @#$%ING #$%@!!! It was
>> almost 120F in our server room. Dispatch could not connect because our
>> 400 shut down. Both chillers had quit earlier in the evening, around
>> 8-8:30.
>>
>> By the time I got there, the A/C guy was there getting the chillers
>> running, and we seem to have had no hardware failure (knocking on wood).
>>
>> Apparently, the phone line that our monitoring device uses got removed
>> w/o us being made aware of it.
>>
>> So, first thing this morning was used to get the monitoring box
>> connected to a line & able to call when necessary.
>>
>> WHEW!
>
>

Well we ran out server room around 100 for a few weeks.. (not by my
choice) and over the next few months had many drives fail, 4 power
supplies, 2 motherboards... Gee imagine that.

Lindsey Johnstone
25-Jun-2007, 11:11 PM
Elrey said,

> Makes ya wonder though... how much of your over all server life was shortened
> due to the heat?
>
> I think you got another 2 weeks before hard drive failures and memory I/O
> errors pop up... hehe
>
> "Stevo" <steveSPAM@LESSccgov.net> wrote in message
> news:iwWfi.1714$Rg7.585@prv-forum2.provo.novell.com...
> > So, I'm out for the evening last Sat nite. Fairly early on (9:30) & I
> > have no vehicle (it's at home), I get a call from the S.O. dispatchers
> > here, cannot connect to inquire on license plates.
> >
> > Like I said, I have no vehicle, so I call my telecommuter co-worker,
> > have him check what's going on. He calls back, he can vpn in, but
> > cannot connect to our 400. Hrmm.......... He calls a local co-worker
> > to go check it out.
> >
> > I get a call around 30 minutes later. HOLY @#$%ING #$%@!!! It was
> > almost 120F in our server room. Dispatch could not connect because our
> > 400 shut down. Both chillers had quit earlier in the evening, around
> > 8-8:30.
> >
> > By the time I got there, the A/C guy was there getting the chillers
> > running, and we seem to have had no hardware failure (knocking on wood).
> >
> > Apparently, the phone line that our monitoring device uses got removed
> > w/o us being made aware of it.
> >
> > So, first thing this morning was used to get the monitoring box
> > connected to a line & able to call when necessary.
> >
> > WHEW!

i was thinking the same thing. Our cooling system in our consolidated server room
(600+ servers in there) conked out about a year ago. Within six months a lot of
the legacy (and off warranty) stuff started dying. We were fine as most of our
servers were newer and the one old piece of equipment was due to be replaced
anyways.

--
Lindsey

Stevo
25-Jun-2007, 11:16 PM
Elrey scribbled something like:

> Makes ya wonder though... how much of your over all server life was
> shortened due to the heat?

I've been wondering about that as well.

> I think you got another 2 weeks before hard drive failures and memory
> I/O errors pop up... hehe

Great, thanks for giving me a vote of confidence. <G>

Beth Cole
26-Jun-2007, 02:04 PM
Stevo wrote:
> Elrey scribbled something like:
>> I think you got another 2 weeks before hard drive failures and memory
>> I/O errors pop up... hehe
>
> Great, thanks for giving me a vote of confidence. <G>

According to one of the best hardware guys I know, every hour of
temperatures over 85F decrease the lifespan of the equipment by
approximately 1 month for every 5 degrees. So, if you can figure out
how long it was that high, you might have an idea of how long you have. :)

Beth

--
Don't go around saying the world owes you a living. The world owes you
nothing. It was here first. ~Mark Twain

Blinky Bill
26-Jun-2007, 02:11 PM
Stevo wrote:

> So, I'm out for the evening last Sat nite. Fairly early on (9:30) & I
> have no vehicle (it's at home), I get a call from the S.O. dispatchers
> here, cannot connect to inquire on license plates.
>
> Like I said, I have no vehicle, so I call my telecommuter co-worker,
> have him check what's going on. He calls back, he can vpn in, but
> cannot connect to our 400. Hrmm.......... He calls a local co-worker
> to go check it out.
>
> I get a call around 30 minutes later. HOLY @#$%ING #$%@!!! It was
> almost 120F in our server room. Dispatch could not connect because
> our 400 shut down. Both chillers had quit earlier in the evening,
> around 8-8:30.
>
> By the time I got there, the A/C guy was there getting the chillers
> running, and we seem to have had no hardware failure (knocking on
> wood).
>
> Apparently, the phone line that our monitoring device uses got removed
> w/o us being made aware of it.
>
> So, first thing this morning was used to get the monitoring box
> connected to a line & able to call when necessary.
>
> WHEW!

WOW, surely it would be worth while hooking it to a mobile device
rather than a fixed line, if fixed lines get cancelled.

D'oh, mobile lines can be cancelled just as easily.

--

Blinky Bill
26-Jun-2007, 02:14 PM
We are lucky in that our UPS has an evironmental monitor on it. If it
gets to hot it shuts down anyhting connected that has an operating
system (except the switches).




--

Stevo
26-Jun-2007, 03:09 PM
Beth Cole scribbled something like:

> According to one of the best hardware guys I know, every hour of
> temperatures over 85F decrease the lifespan of the equipment by
> approximately 1 month for every 5 degrees. So, if you can figure out
> how long it was that high, you might have an idea of how long you
> have. :)

Will have to try and figure that out. Doesn't count if they're powered
off, right? Cuz we got things shut down after about 1.5 hours or so.

Stevo
26-Jun-2007, 03:12 PM
Blinky Bill scribbled something like:

> WOW, surely it would be worth while hooking it to a mobile device
> rather than a fixed line, if fixed lines get cancelled.

Don't think it could use a mobile line, it's pretty old (circa 1989).
Putting it in the 'wish list' for next fiscal year to replace it.

The debate here has been whether or not to connect it to our phone
system. The big thing there is, if the phone system's down due to some
major catastrophe, the thing won't be able to dial out. That's why we
had it on its own line.

Blinky Bill
26-Jun-2007, 03:22 PM
Certainly if it is staying fixed, then a seperate PSTN service would be
the best idea IMO.




--

Beth Cole
26-Jun-2007, 04:49 PM
Stevo wrote:
> Beth Cole scribbled something like:
>
>> According to one of the best hardware guys I know, every hour of
>> temperatures over 85F decrease the lifespan of the equipment by
>> approximately 1 month for every 5 degrees. So, if you can figure out
>> how long it was that high, you might have an idea of how long you
>> have. :)
>
> Will have to try and figure that out. Doesn't count if they're powered
> off, right? Cuz we got things shut down after about 1.5 hours or so.

If I can get him on the phone (he's been laid off from his job, so he's
a bit more difficult to track down), I'll ask. I believe that it then
goes to 1 month per 10 degrees, because you're still baking the equipment.

--
Don't go around saying the world owes you a living. The world owes you
nothing. It was here first. ~Mark Twain

Stevo
26-Jun-2007, 04:50 PM
Beth Cole scribbled something like:

> If I can get him on the phone (he's been laid off from his job, so
> he's a bit more difficult to track down), I'll ask. I believe that
> it then goes to 1 month per 10 degrees, because you're still baking
> the equipment.

Ah, that makes sense

Danny Fabrizius
26-Jun-2007, 04:54 PM
We had a similar issue just over a week ago.
No environmental alerting. We got a rash of servers shut down to heat
msgs. The thermometer on the AC showed 95. But the temp of the core
switches was 146.

Stevo
26-Jun-2007, 04:54 PM
Danny Fabrizius scribbled something like:

> We had a similar issue just over a week ago.
> No environmental alerting. We got a rash of servers shut down to
> heat msgs. The thermometer on the AC showed 95. But the temp of the
> core switches was 146.

Yikes!

KeN Etter
26-Jun-2007, 05:42 PM
On Tue, 26 Jun 2007 13:14:23 GMT, "Blinky Bill" <koala@gumtree.com.au>
wrote:

>We are lucky in that our UPS has an evironmental monitor on it. If it
>gets to hot it shuts down anyhting connected that has an operating
>system (except the switches).

After I came in one weekend to find the server room AHU shutdown and
the temp crossing the 100 degree mark....I started investigating
options. Bought smart cards with temp monitors for the UPS.

Novell....it does a server good!

Beth Cole
26-Jun-2007, 05:42 PM
Stevo wrote:
> Beth Cole scribbled something like:
>
>> If I can get him on the phone (he's been laid off from his job, so
>> he's a bit more difficult to track down), I'll ask. I believe that
>> it then goes to 1 month per 10 degrees, because you're still baking
>> the equipment.
>
> Ah, that makes sense

Yep, I was remembering correctly.

Dean was out fishing when I called. He said he'd trade me jobs for a
week if I wanted.

--
Don't go around saying the world owes you a living. The world owes you
nothing. It was here first. ~Mark Twain

Stevo
26-Jun-2007, 05:50 PM
Beth Cole scribbled something like:

> Yep, I was remembering correctly.

Guess I got some math to do. <G>

> Dean was out fishing when I called. He said he'd trade me jobs for a
> week if I wanted.

Sounds like a good trade. Fishing for a week vs coming in to work?

Beth Cole
26-Jun-2007, 05:56 PM
Stevo wrote:
>> Dean was out fishing when I called. He said he'd trade me jobs for a
>> week if I wanted.
>
> Sounds like a good trade. Fishing for a week vs coming in to work?

As much as I'm looking forward to my vacation in late July, I'd still
rather be at work than not have any many coming in!

--
Don't go around saying the world owes you a living. The world owes you
nothing. It was here first. ~Mark Twain

Stevo
26-Jun-2007, 06:23 PM
Beth Cole scribbled something like:

> I'd still rather be at work than not have any many coming in!

Any many? ;-)

Blinky Bill
27-Jun-2007, 03:27 AM
Danny Fabrizius wrote:

> We had a similar issue just over a week ago.
> No environmental alerting. We got a rash of servers shut down to
> heat msgs. The thermometer on the AC showed 95. But the temp of the
> core switches was 146.

Ouch

--

Blinky Bill
19-Nov-2007, 02:16 AM
> I believe that
> it then goes to 1 month per 10 degrees, because you're still baking
> the equipment.

Hey Beth,

I have done some searching on the web to try to get a refence for this
but cannot find anything useful.

Is there any chance of getting some reference from your friend (I hope
he has a job by now) that I can quote to management?

--