Help - Search - Members - Calendar
Full Version: Major Server Load Problem.
The Planet Forums > Control Panels > cPanel/WHM
cphan
It started yesterday and has now happened today. Oddly enough it started at the same time and it will probably get at the same time. I don't know what the cause of the high server load can be. But I do know it starts about the same time around late noon right before 1PM PST. The high server goes as high in 30-40 CPU. And I don't know what can cause this. I did a graceful reboot server times but that didn't work. I did a top command and I don't see anything usually. Any suggestion on where to look?

16:43:47 up 43 min, 1 user, load average: 22.21, 16.81, 16.29
319 processes: 308 sleeping, 11 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 15.3% 0.0% 3.1% 0.7% 1.8% 79.0% 0.0%
Mem: 505400k av, 499780k used, 5620k free, 0k shrd, 3964k buff
378252k actv, 89388k in_d, 2452k in_c
Swap: 2097136k av, 436316k used, 1660820k free 90628k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
15061 nobody 15 0 5264 4708 1400 S 1.4 0.9 0:00 0 httpd
17710 nobody 16 0 4308 3864 1904 S 1.2 0.7 0:00 0 httpd
12273 nobody 15 0 5772 4240 1420 S 1.0 0.8 0:02 0 httpd
11913 nobody 15 0 5140 4584 1372 S 0.9 0.9 0:00 0 httpd
17739 nobody 15 0 4552 4096 1892 S 0.9 0.8 0:00 0 httpd
11629 root 15 0 1020 980 392 R 0.7 0.1 0:09 0 top
12128 nobody 15 0 5872 5108 1676 S 0.7 1.0 0:02 0 httpd
14382 nobody 15 0 6188 5608 1388 S 0.5 1.1 0:01 0 httpd
17738 nobody 15 0 3356 2888 1852 R 0.5 0.5 0:00 0 httpd
17750 nobody 15 0 3940 3500 1852 S 0.5 0.6 0:00 0 httpd
17904 nobody 19 0 3732 3236 1668 S 0.5 0.6 0:00 0 httpd
11909 nobody 15 0 5464 4572 1404 S 0.3 0.9 0:02 0 httpd
11918 nobody 15 0 5512 2904 1432 S 0.3 0.5 0:01 0 httpd
11921 nobody 15 0 5312 4712 1460 S 0.3 0.9 0:02 0 httpd
11943 nobody 15 0 7116 6320 988 D 0.3 1.2 0:01 0 httpd
12108 nobody 15 0 4808 3096 1400 S 0.3 0.6 0:01 0 httpd
12114 nobody 15 0 5136 2580 1364 D 0.3 0.5 0:01 0 httpd
14111 nobody 15 0 5384 4852 1668 S 0.3 0.9 0:01 0 httpd
16332 nobody 15 0 2192 1484 1036 S 0.3 0.2 0:00 0 httpd
17332 nobody 15 0 4836 4392 1824 S 0.3 0.8 0:00 0 httpd
5 root 15 0 0 0 0 RW 0.1 0.0 0:01 0 kswapd
3528 mysql 15 0 36328 16M 1176 S 0.1 3.3 0:00 0 mysqld
10661 root 15 0 10716 2404 1040 D 0.1 0.4 0:20 0 rpmv
11910 nobody 15 0 2692 1756 1100 S 0.1 0.3 0:00 0 httpd
11911 nobody 15 0 5912 5272 1724 S 0.1 1.0 0:02 0 httpd
11919 nobody 15 0 6540 3548 1448 S 0.1 0.7 0:01 0 httpd
11920 nobody 15 0 4516 3232 1376 S 0.1 0.6 0:01 0 httpd
11933 nobody 15 0 4536 3928 1488 S 0.1 0.7 0:01 0 httpd
11936 nobody 15 0 5112 3392 1328 S 0.1 0.6 0:01 0 httpd
11949 nobody 15 0 5412 4516 1440 S 0.1 0.8 0:01 0 httpd
12283 nobody 15 0 6388 4544 1372 S 0.1 0.8 0:01 0 httpd
12327 nobody 15 0 5356 4632 1276 S 0.1 0.9 0:01 0 httpd
14058 nobody 15 0 3628 2480 1372 S 0.1 0.4 0:00 0 httpd
14354 nobody 15 0 5024 3400 768 S 0.1 0.6 0:02 0 httpd
14770 nobody 15 0 5132 4640 1488 S 0.1 0.9 0:01 0 httpd
Matt Brown
319 Processes is alot what are your server specs.
cphan
The basic one. I only have one site on the server which is own by me. But I'm not alone anymore. I'm just glad it's not only me.

http://www.webhostingtalk.com/showthread.p...099#post1821099

There's something funnying going on here. There has to be a reason why the server load reaches high about 40 in the noon then all of a sudden dies down late at night.
Matt Brown
do you have a firewall and have you tried running clamav or chkrookit
eddy2099
Is it possible that it went up because WHM was doing all those log rotations, stats generation, backups and updates which it usually does and goes down after it is done ?

1 site can be as active 1000 sites, it all depends on the number of visitors and contents that you have. CPU can be loaded with MySQL and Scripts.
TP
It is a WHM Glich/Bug
I bet your running 8.6 or 8.7 huh?
cphan
QUOTE (eddy2099)
Is it possible that it went up because WHM was doing all those log rotations, stats generation, backups and updates which it usually does and goes down after it is done ?  


That was what I originally thought but that's not it. Again it's been like that for hours. It started 12 noon and still doing it now. Plus support verify there was any cron going on.
cphan
QUOTE (TP)
It is a WHM Glich/Bug
I bet your running 8.6 or 8.7 huh?


I'm running 8.8 now. That's some glich? Where did you hear about such a glitch that cause high server load up to the 40s? If there was such a glitch, more people would have it.
mta
Hi
I have the same problem icon_cry.gif
The server load is usually high at night(GMT) and apache in always down

What is the solution for that?

Thx in advance
cphan
Ahh... 8PM PST time and things are going back down to normal CPU usage again. Under 1 CPU. Just have to wait till tommorrow around 1 PM PST for the whole fun to start again with 30-40 CPU usage for no apparent reason.
The high CPU until 8 PM or so. LOL. If I'm going to suffer then I hope others out there will suffer with me. icon_twisted.gif
eddy2099
Sounds like it is possibly a WHM glitch. It is probably better to upgrade to WHM 8.80 as described if you want to be on a Release branch. If not, downgrade to WHM 8.5.3 in the stable branch.
cphan
QUOTE (eddy2099)
Sounds like it is possibly a WHM glitch. It is probably better to upgrade to WHM 8.80 as described if you want to be on a Release branch. If not, downgrade to WHM 8.5.3 in the stable branch.


Again I like to ask where you hear about this glitch? A link or something? Sorry but I'm pretty sure there are many others out there using 8.6, 8.7, or 8.8 bleeding and I don't see them saying anything. I'm not saying it's not a glitch just provide a link or something that confirms it's a glitch.
Matt Brown
I believe it is a WHM glich which is fixed in the newer releases
cphan
Again can I ask for an evidence to your hutch? I'll contact Cpanel and see if they can confirm this glitch.
eddy2099
It did appeared in WHM for a while before the current notice but they seems to removed it already.

Check out http://forums.cpanel.net/showthread.php?s=...light=high+load . You are not alone.
cphan
Interesting. But a CPU of 4 or 5 is nothing. I'm talking about 20-40 here for hours. However I'll wait and see if Cpanel has anything to say. If they back it up then they would just tell me to downgrade which I can do. That still doesn't explain the high CPU in the odd hours. Why start at 1PM PSt noon and last till about 8PM PST? Outside of those hours, CPU is at 1 or below.
eddy2099
You could of course downgrade to the Stable version since that is supposed to be stable and if it doesn't change anything, you can upgrade to the release version again.
cphan
QUOTE
At this time there are no glitches that I'm aware of within cPanel that would cause a high load... loads can be caused by many things and it will vary from server to server.  
--
Clifford P
Technical Support Representative  
cPanel, Inc.  


There you go.

Anyways I could downgrade but I just had them install Urchin. If I downgrade to 8.5, then what happens to Urchin? I just pay those guys to install Urchin.
eddy2099
I guess you could maintain what you have and hope they do come out with a patch or find out what causes the problem. Did the message logs tell you what happened during the time the problem occurs.
cphan
Most of the hits were coming from:

/usr/local/apache/bin/httpd -DSSL

ServerMatrix support guy said that it looks like a cause of an extremely amount of people visiting your site as he couldn't find any cron job going at the time. My response was that I doubt it was cause by extremely high traffic. It wouldn't last for hours ranging from 20-40 CPU. People go away after they can't get a site to load up after awhile. They are not going to sit there and keep trying.

QUOTE
There seems to be just a ton of http processes running when we run TOP. TCPDUMP shows alot of activity as well. I attached a file with the TOP output during the high load.


I'm getting the same thing as the support guy. I'm hoping the daytime support people can figure out what's going on. Maybe they can put bits and pieces together if others are getting it as well but haven't figure out that they are connect together. The irony is that it started on the afternoon that I had Urchin install during the morning.
eddy2099
Since you managed that it occurs on the day you had Urchin installed, it is possible that it could be caused by Urchin ? This should be the first place you should look. If you disable Urchin for a day or two see if the issue resurfaces. If it does not then we know it is urchin causes it, probably updating the live stats or something.
cphan
Doubt it. Since Urchin was install it hadn't actually ran yet during the morning. Cause inorder for Urchin to actually take effect and be integrated with Cpanel, Cpanel had to update. Cpanel doesn't do it's update until early morning today. And when it did, Urchin was then integrated for the first time and there was no stats cause it was just activated this morning. Urchin stats won't take effect until tommorrow morning during Cpanel update. Urchine process it's stats when Cpanel process it's stats I believe.
dball
It looks like you only have 512MB of ram on the server. I wonder if you're crossing a memory threshold and getting the high load from swapping.

Is your server handling a lot of email? Spamassassin seems to use a lot of CPU.

Does urchin have some process during the day that takes up memory?

How many sites a day does your site get ? I'm on a celeron 1.7 with a gig of ram and serving about 15000 pages (50000 hits) a day and my load rarely reaches 1.00 . I'm on WHM stable with no urchin. I also have a program running in the background that takes about 170MB of ram and does about 1 gig of I/O a day. Typically I'm only running about 110 processes according to TOP.

What do you get if you run "cat /proc/meminfo" during the high load ?
TP
http://forums.cpanel.net/showthread.php?s=...light=high+load

All you have to do is search there forums..
It is a bug that shows incorrect CPU Usage.
TOP does not match it, so WHM bug.

I too have 8.x running and have upgraded to the new version..this is not the stable release.

I am sure Nick will fix it.

By the way the new release has tons of new updated stuff !!
cphan
You don't read do you. I just sent a message to Cpanel it's a bug and they said no. There's no such bug.

Second I just upgraded an additional 512 Ram making it a total of 1024 Ram and it's still happening. Oh well at this point Support willl look into this.

The rest of you are just giving plain speculation on the matter. There's no real claim that it's a bug in WHM either than you think it's a bug. If it's a bug than Cpanel would tell me to downgrade.

QUOTE
All you have to do is search there forums..
It is a bug that shows incorrect CPU Usage.
TOP does not match it, so WHM bug.


Where do you come up with this stuff? Sorry I'm a little bit frustrated and I feel like taking it out on someone. Cpanel says that CPU is at 25. Top says CPU is at 25 as well. The both match. So where did you get the idea that top doesn't match?
cphan
Okay at this point, desparation cause for desparate things. I'm going to download to stable to see if that will solve anything. Even if it means losing Urchin. But I don't think it wil work. But hey worth a try at this moment.
cphan
Well downgrading to 8.5 stable didn't work. So I've now upgraded back to 8.8. Adding new RAMS didn't work. Maybe it's kernel related with Enterprise.

It's Celeron 1.7
with RedHat 3 Enterprise on 2.4.21-9.EL
And 1024 RAM
Using CpanelWHM 8.8 it still happens on Cpanel/WHM 8.5 stable as well.


What's the latest kernel? Maybe I can request an upgrade on kernel to see if that might fix it.
Matt Brown
running 300 + Processes on a 1.7 Celeron is your problem, that is WAY to many processes in my book to be running on that lil machine.
cphan
QUOTE (Matt Brown)
running 300 + Processes on a 1.7 Celeron is your problem, that is WAY to many processes in my book to be running on that lil machine.


That's pretty obvious. The question is why is it doing 300 process. Never done it before. Started doing it on Tuesday afternoon. I'm not the only one. Others that have similiar problems are also running 300+process. Where's the sudden surge in that much process lately? I only have one site so it makes no sense to be running 300 process.
cphan
16:11:03 up 29 min, 1 user, load average: 1.05, 1.26, 0.99
259 processes: 255 sleeping, 2 running, 2 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 37.3% 0.0% 7.1% 0.5% 1.7% 2.7% 50.2%
Mem: 1022480k av, 986576k used, 35904k free, 0k shrd, 22552k buff
747288k actv, 190160k in_d, 5044k in_c
Swap: 2097136k av, 0k used, 2097136k free 388460k cached

Get's me 1.05 CPU on a busy day. You mean to tell me a different of 40 more process will get me 40+CPU. Plus there's been time where the process is only 100 and it still get me 20+CPU.

This is normal for my site.

16:13:54 up 32 min, 1 user, load average: 0.80, 1.10, 0.97
226 processes: 221 sleeping, 4 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 36.2% 0.0% 7.9% 1.9% 1.5% 3.9% 48.2%
Mem: 1022480k av, 988912k used, 33568k free, 0k shrd, 23248k buff
752484k actv, 191064k in_d, 5460k in_c
Swap: 2097136k av, 1260k used, 2095876k free 393920k cached
Matt Brown
Well I don't know why this is, of course I'm not using RHE, that could be your problem also I dunno it's odd though very odd maybe tech support will come through soon for you.
philb
When your loads are high have you tried getting an output from apachectl's fullstatus command?

(go to Server Status -> Apache status in WHM)

This'll show you what all those apache threads are currently serving and help you pin down the problem, as it seems to be caused by lots of httpd threads.
cphan
I'll try that but when server load is high, I can't even get into WHM.
eddy2099
You might want to take advantage of the current offer to downgrade to RH 9 and have a working machine rather than have problems with this machine day in day out. At least you be much happier.

http://forums.servermatrix.com/viewtopic.h...t=3542&start=49
philb
ok, well, ssh in instead. I know for a fact that it is possible to get into machines with loads of 200, so it's got to be possible for you to get in at 40 if you persevere.

I recommend you get lynx installed (if it isn't already) and find where apachectl is located on your machine (it may not be in your path). When the issues start, do apachectl fullstatus - and you'll get exactly the same thing - be prepared to wait a very long time.
cphan
This is more like it:

11:25:39 up 27 min, 1 user, load average: 0.47, 0.47, 0.39
320 processes: 315 sleeping, 5 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 23.4% 0.0% 4.5% 0.1% 0.5% 0.9% 70.2%
Mem: 1022480k av, 836240k used, 186240k free, 0k shrd, 19372k buff
575256k actv, 212276k in_d, 5584k in_c
Swap: 2097136k av, 0k used, 2097136k free 369668k cached


I'm wonder if it's a kernel problem:

WARNING: Kernel Errors Present
hda: drive_cmd: error=0x04 { DriveStat...: 3Time(s)
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }...: 3Time(s)

QUOTE
I have another server with RHEL+Cpanel and the only problem I've been having with it is a soaring memory usage once a day (seemingly unrelated to any of my cron jobs) caused by MySQL processes building up and filling up memory, which causes swapping, which slows down the machine, whcih causes more MySQL processes to build up ... etc My record load was 77 ... I could barely make it reboot but then it was OK.


I think this is what's happening to me. And it's resulting in the high CPU. The question is why is it doing that all of a sudden. The only thing I can think of is that I did do a kernal upgrade a week ago on the 28.
The thing I'm noticing is that if I catch it early on the point where it starts to build up with high CPU, doing a reboot fixes it. However if I'm not sitting watching the thing, and it builds up with high CPU for hours, the reboot doesn't seem to be fixing it. Of course I could be talking out of my ass here. I'm no system admin.
philb
QUOTE (cphan)
WARNING: Kernel Errors Present
hda: drive_cmd: error=0x04 { DriveStat...: 3Time(s)
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }...: 3Time(s)


Er.. The last time I saw something like this was when I was trying to read an extremely damaged CDR under *nix

My money's on your disk failing with errors like that. If you're not already, start a rigorous backup procedure.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.