Help - Search - Members - Calendar
Full Version: Server keep crashing for no appearant reason
The Planet Forums > System Administration > Other
eaudet
Hi,
I have a server that keeps crashing ... it simply turn itself off completely. It's not a proper shutdown ... it's like if the power was removed.

There is nothing in the logs I can find that cause the crash. When reading the logs, all is working well ... nothing goes in the log that looks wrong before it crash.

LONG STORY

Everytime, I have to ask The Planet to manualy start my server. There is no other way!

I've asked them to help me find the problem ... but they are not really helpfull on this saying that the machine looks ok and that it's because of the load. One time they said it was because there was too many mysql connection. Another time they said it was because there was too many httpd connection.

Today, it crashed because I was upgrading perl from 5.8.1 to 5.8.8 ... by upgrading perl, it moved the server load to around: load average: 4.40, 5.38, 6.32

Everytime it goes over load average of 5 for more than a few minutes, the server just shutdown/crash! OUCH!

I don't dare do anything with the server anymore!

They other thing is that if it happens during the night, I only notice it the next day ... so my clients are down for many hours until I can open a ticket for theplanet to fix it.

As far as the "Monitoring" they provide in the "Server Command", I had it setup so that if the server was going down, it would send a ticket for "reboot" ... but that isn't working. When I asked why it wasn't working, they said:
ANSWER: The " Service Monitoring" that you are referring to, is no longer working. We are in the process of replacing this service with a more robust monitoring service.

But it hasn't worked since last november when they merged with Orbit and than came back to server command.

WHAT AM I TO DO? I am desperate.

Eric
Jeff
Can you give us the details of your hardware (is it a really really old rac or something extremely low powered?)

On a Xeon for example I would not expect the server to hard lock with a load of under 80.

Can you ask them to tell you what's going on at the console when it's locked?

If it's actually powering off I would suspect a hardware issue.

On a live server with paying customers, the downtime while diagnosing such a problem is frustrating -- maybe this would be a good time for you to upgrade to a new server? Depending on how attached you are to the existing IPs (secure certs?) it might be a lot easier to transfer accounts than go through a lengthly hardware troubleshooting process with a live server.
AaronC
If you can afford to schedule the downtime, I would ask for a full chassis swap (RAM too). This will give you the same harddrives but the rest of the hardware would be different. If the problem is in the server hardware, this could resolve it.

If that doesn't solve the problem, I would back up all of my data and ask for an OS reload and a new set of harddrives. Once that was complete and you restored your data to the server (our new Dedicated Backup Server would be great for this, btw), the only unifying factor between your old server and this would be the software. If the problem persists, I would look closely at your installed software for exploits (not likely), memory leaks (possibly hard to identify) and configuration errors.
eaudet
Yes, it's an old server.
model name : Intel® Pentium® 4 CPU 2.60GHz
cpu MHz : 2600.070
Mem: 1022480k

maximum process I've seen is: 130
load average rarely goes over 5. When it crashed today: 4.40, 5.38, 6.32

And it crashed a second time also. When it came back up, I tried re-installing perl 5.8.8 and again, it crashed.


Here's the answer I got after they rebooted the machine:
QUOTE:
It seems that your server was shut down, as I went to investigate the power was completly off

I have turned on the power and your server has booted up fine and is responding to ping and ssh
:END QUOTE
AaronC
Yeah, I really would ask for a chassis swap.
Tomy Durden
How does the iowait look? If it's high, then we might want to suspect the HDD's.
eaudet
QUOTE (TP-TDurden @ Sep 14 2007, 01:55 AM) *
How does the iowait look? If it's high, then we might want to suspect the HDD's.


iowait seems normal. But yes, I've seen it go up once in a while. The last crash however, iowait was 9.9%.

Thanks alot for your ideas an comments. I finaly got an acknowledgement from support that it could be hardward. Here's the last answer I got:

QUOTE:
I have investigated your server and found no cause for a system crash. I feel we may need to perform hardware tests on your servers memory and maybe more to check for bad hardware.
:END QUOTE

I'll let you know the result.

In the meantime, I am looking into another server to replace this one. I have a few questions on these. I will open a new topic.

Eric
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.