Help - Search - Members - Calendar
Full Version: Suicidal Tendenacy ..
The Planet Forums > General > Suggestions/Comments
digitek
HeadSurfer/Rackshack,
I'm in quite a huge hurry this morning, so this rant will be relatively short I think, but I have a lot on my mind. I got kicked out of my house yesterday and I'm pretty much without internet access till I buy myself an apartment .. icon_razz.gif

Anyhow, I've had quite a few problems lately. I need some resolutions to my current problems ... and there is no one that can help me, and I don't know what to do. I actually have 5000 problems with all of my servers, but they can wait. I have 1 big problem ..

216.40.250.34, a Duron 1ghz server, has cost me so much money & time. The server crashes at least once daily, at random times. When it isn't down, it is so slow that it is nearly inaccessible. I have gotten about 30% uptime with the server, and many headaches.

I've visited the irc channel and submitted many tickets asking someone to look at it and figure out what's going on, but no one has. My last trouble ticket, was right after a crash, asking techs to check it out. You should check the ticket ... its pretty funny ... it took the techs 4 days to close the ticket. Did they even look at the server? No. They had a 4 day long password reset marathon. That ended up being the entire subject of my ticket. Whether or not it was ok for them to reset the password.

I posted the correct password in the ticket .. actually in 2 tickets to make sure they saw it ... but they said it was wrong ... heh ... so they wanted to reset it ... which was fine .. who cares .. as long as they looked at it .. it didnt matter.

Well they did the reset at 4:00AM this morning ... rebooted the server .. and that was it. Due to the extreme screwed upness of the server, I removed httpd from my start-up scripts ... so it doesnt stop automatically ... well after the reset .. it just sat there .. no one looked at it .. and httpd has been down for 7 hours ... which is just 7 more hours of wonderful downtime for me =/ ...

I logged in to check the ticket when the server was down again, and figured it was just another crash. Well ... the ticket was CLOSED .. and the resolution was 'password reset per customer request' or something ...

Now what in the world gives? The site on that server loads slower than a 56k webserver tryin to push 500GB/month ... and crashes constantly ... I've begged for a new one ... I've begged for you guys to give me a replacement and a day or so to move everything ... I've begged for diagnostics .. I've begged for help ... and no one has done anything ... it's just 'submit a request for restore' or some simple little way to fix everything ..

I can't do that ... I've gotta keep this server up ... its important to my customer ... and with its current track record .. I just cant bring myself to request a restore .. cause only god knows how long that would take .. and I wouldn't be in control of whether or not it was quick & painless ... I'd rather migrate it to a new server myself etc ...

Anyhow ... Another wonderful point ... when the drunken & disgruntled electrician tore up the UPS or whatever it was ... Both of my Ensim Servers were corrupted to the uttermost ... I had to hire a consultant to fix everything .. and it cost me $600 ... my RaQ had a corrupted database ... etc.

I've worked my butt off trying to keep my servers up .. and I've done my part to ensure that. I've spent more time trying to make up for RackShack's mistakes than I have expanding my business.

I don't know what to do other than just beg you guys to have mercy upon me ... Give me another Ensim/Plesk box to replace the Plesk machine I now have .. or help me figure out what is wrong with my current box ...

Do something to make me happy ... I'm about ready to fall off of a bridge and cry or something weird.

As of right now I have 3 servers down ... it'd be nice to have those rebooted as well.

Have a good day and sorry for your time.

Digitek
PatrickS
So does your server stay online when the webserver is down? Wouldn't this suggest that the web site traffic is causing it to crash? Is there actually something wrong with the web server itself? Considering at present I see that your access_log and error_log are over 2.1 GB a piece it would appear to be getting a massive amount of traffic. This could also potentially be your issue, the webserver wont run if the access_log cant be written to and once a file reaches that size it hits a filesystem limit. I've taken the liberty of removing the current one. My suggestion is to write a script to rotate this log to prevent this from happening, because of your server traffic it is growing at an insane rate, it's already at 100k and I just reset httpd.
digitek
That could quite well have been a problem ... maybe I should just turn off httpd logging, no?
digitek
U dont have to work on it anymore ... those 3 big sites that I ran for that customer will no longer be my responsibility ... I was just informed that he had made other arrangements for web hosting ...

Oh well =/

Since it really doesn't matter now, how possible would it be for you guys to do a full restore on that server to Ensim instead of Plesk?

Thanks guys ...

Michael
EOC_Jason
Sounds like RackShack should put the latest Linux Kernerl on the servers and use the transaction based file system since they crash so much at least it can give some protecting against data corruption by being able to do a rollback...

*sigh* when will the day come when RS will offer co-location? I would so love to be running my own (dependable) hardware tailored to my needs and redudancy rather than be at the will of a bargin box which so many people have seemed to be disliking...
mmoncur
I think there's just some bad apples in the bunch. My Duron server has performed wonderfully, is running a busy site with no trouble, and has only gone down unexpectedly once in the last 30 days. (And you can guess when that was.)

It does seem like support needs to take a good hard look at some people's hardware, though.
pmak0
> I think there's just some bad apples in the bunch.

Yeah, that has happened to me. My Duron machine would crash a lot when I tried to compile programs; I managed to get them to swap my hard drive into a different machine and it ran fine after that.

One tip if your machine keeps crashing by itself: Leave a telnet/SSH window open and type "top", and let it run there. When it crashes, look at that screen (save it) and you'll know what was happening at the moment the machine crashed.

I used the above technique to diagnose that my RaQ was crashing every day when the log analysis software ran, exhausting all the swap space since my website got too much traffic.
winston
QUOTE
Originally posted by EOC_Jason

*sigh* when will the day come when RS will offer co-location? I would so love to be running my own (dependable) hardware tailored to my needs and redudancy rather than be at the will of a bargin box which so many people have seemed to be disliking...


I've had no problems at all with my actual hardware (only the screwdriver incident so far last Friday, but that got everyone).
However, to protect from future 'screwdriver incidents' etc. I installed the latest (2.4.17) kernel and upgraded my filesystems to ext3 (a journalled filesystem - it won't get corrupted by a dirty reboot. And it won't get stuck in disk maintenance mode when e2fsck finds something it doesn't like. JFS is nice).
linenoyz
I was just a victim of that log suicide problem... How can I prevent it, or should I just disable logging?

Look at the size of this beast...

-rw-r--r-- 1 root root 2147483647 Jan 2 19:09 error_log
pmak0
Originally posted by winston:
> However, to protect from future 'screwdriver incidents'
> etc. I installed the latest (2.4.17) kernel and upgraded
> my filesystems to ext3 (a journalled filesystem - it won't
> get corrupted by a dirty reboot.

Is it possible to perform this upgrade to a server that you don't have physical access to? How hard is it? Is it dangerous?

I've got a RaQ4i that's not been put into production yet, so I'm thinking of upgrading its kernel since the current one is pretty old.
pmak0
QUOTE
Originally posted by linenoyz
I was just a victim of that log suicide problem... How can I prevent it, or should I just disable logging?

Look at the size of this beast...

-rw-r--r--    1 root     root     2147483647 Jan  2 19:09 error_log


Your error_log has grown to 2 GB, the maximum size that a single file can be. That's pretty big!

It's unusual for an error log to become so big, though. You might want to see what sorts of errors are happening; there might be something that you should fix.

As for how to prevent the logs from killing your machine when they are analyzed, you have to either analyze and rotate them quickly before they get too large (e.g. have webalizer run daily and rotate the log daily), or just turn off logging completely.
mango
Hey,

You can edit /etc/logrotate.conf and set it to daily/weekly rotation or a max. filesize. If you have an analysis proggie running on it, be sure to adjust the interval.
winston
QUOTE
Originally posted by pmak0

Is it possible to perform this upgrade to a server that you don't have physical access to? How hard is it? Is it dangerous?  
 
I've got a RaQ4i that's not been put into production yet, so I'm thinking of upgrading its kernel since the current one is pretty old.


Yes it is. I did this on my WBL Duron system (which I don't have physical access to). Since it's a bog-standard whitebox Linux system with none of that stuff that makes a RaQ so 'special', it was a very straightforward process (well, for me at least - I've built a few kernels, and I've been using Linux since kernel 0.12 )

You can read about what I did here: http://forum.rackshack.net/showthread.php?...=&threadid=2149

As for the RaQ - it may have some unique Cobalt devices. You may have to talk to Cobalt to get the source for these to build a new kernel (the GPL legally compels them to give you the source to any kernel mods they made if they made them - they certainly have some drivers for the LCD etc). Doing a dmesg|more will tell you about what your kernel did when it booted. Other useful commands are lspci to discover what hardware (ethernet adapter etc.) so you know how to configure a new kernel.

On a whitebox system, it's not really dangerous at all since you can configure Lilo so that the old kernel is still bootable if it all goes horribly wrong (the RS techs just need to plug in a keyboard/display etc to see the lilo menu on boot). I don't think it's even possible to plug a keyboard into a CobaltRaQ (but I could be wrong) - and if that's the case, if you hose it up...well, you'll be needing to do a full restore.
linenoyz
Does the JFS enable itself automatically, or do you have to convert everything somehow (ala FAT16->FAT32 conversion in Windows)? I'm interested, but stuff like that frightens me. icon_smile.gif I'm still learning Linux...I know enough to get around, but yeesh... I don't wanna break the thing, ya know? Mine's an Ensim box, btw.
linenoyz
I'm bringing this one back...

I solved the problem with logs in /var/log/httpd, but now the problems lie with the individual site logs. How can I set it up so they rotate daily and get gzipped, too?
DnW_Dave_B
QUOTE
Originally posted by pmak0
> I think there's just some bad apples in the bunch.

Yeah, that has happened to me. My Duron machine would crash a lot when I tried to compile programs; I managed to get them to swap my hard drive into a different machine and it ran fine after that.

I was talking to a friend the other day about those possibly damaged servers, mostly recounting the screwdriver incident. icon_biggrin.gif Told her RS should do something, like swap them for a new one. Then it dawned on us, if they use the old for spare parts, I wouldn't want any of the possibly damaged parts. ... I wonder who got the swapped drive? icon_biggrin.gif Maybe RS can add them to the bill for the contractor. icon_biggrin.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2010 Invision Power Services, Inc.