Hello all. Old member, new id.
Four weeks ago, I got my 4th XEON server with The Planet, but unfortunately this new server is not stable. It's locking up about twice a day now.
The first time it locked up, it stopped serving http and didn't respond to ftp or ssh. I created a ticket so that the techs could check the console, etc. to see if they could see what was happening. The response was that it was unresponsive at the console, so they rebooted it.
Yesterday it locked up twice. I used the Orbit reboot feature to power-cycle it. I checked all the logs I could find for any sign on the problem, especially /var/log/messages. There was nothing interesting in the log, nothing unusual shown before the reboot.
I submitted another ticket to have the techs run full diagnostics on it. The server passed all the tests.
Server locked up again today.
This server has only 3 or 4 websites on it, and it's practically asleep, so I don't think the issue is related to the load.
The only thing interesting I've seen, is that the temperature sensor for VRD 0_TEMP looks a bit on the high side when I'm going to reboot. It was almost in the red zone.
At this point, I'm of the opinion that I got a junk piece of hardware. (It was a special listed in Orbit when I got it.) Since I spent dozens of hours uploading content to this server, I'm not real excited about starting over. What I would like to do is have the disks taken out, and put into a new machine, and the new machine put into the existing spot on the network, so I can keep all my uploading and configuration work, as well as keeping my IPs. I don't know if this is possible or how much it will cost, and am waiting on a ticket to find out the official answer.
Here are some things I'd like to know if anybody has any advice or answers...
- How bad is a high VRD 0_TEMP reading? Could it be related to my problem? If so, what is next step?
- Can I get the hardware swapped while keeping my disks and IPs? Anyone know how much will that cost?
- Am I liable for the costs of dealing with bonked-up hardware? If not, what does it take to prove hardware at fault?