Help - Search - Members - Calendar
Full Version: HOW-TO: PRM (Process Resource Monitor)
The Planet Forums > System Administration > HOWTOs
Pages: 1, 2
rfxn
Introduction:
PRM monitors the process table on a given system and matches process id's with set resource limits in the config file or per-process based rules. Process id's that match or exceed the set limits are logged and killed; includes e-mail alerts, kernel logging routine and more...

How it works?:
PRM works on the basis that once a process id is found matching resource limits; there is a corresponding trigger and wait value. The trigger value increments upwards from zero (0) to the defined value, pausing the duration of seconds defined as wait value. There after the status of the flagged pid is checked again, if still above or equal to resource limits the trigger/wait cycle begins again till the max trigger value is reached. When this trigger value is reached the given process is logged/killed.

This all together has the effect that applications with short burst resource spikes (e.g: apache, mysql etc..) are not killed; but rather on applications with prolonged resource consumption. Using the rule system, you can define different wait/trigger/resource values for any application.

Installation:
First we must fetch the package:
# wget http://www.rfxnetworks.com/downloads/prm-c...-current.tar.gz

And extract it:
# tar xvfz prm-current.tar.gz

The current version of prm as of this writing is 0.3, so lets cd to the 0.3 extracted path:
# cd prm-0.3/

And finally run the enclosed install.sh script:
# ./install.sh

Configuration:
The prm installation is located at '/usr/local/prm', and the configuration file is labeled 'conf.prm'.

Open the '/usr/local/prm/conf.prm' file with your preferred editor. There is an array of options in this file but we will only be focusing on the main variables.

Lets skip down to the user e-mail alert's section and set the USR_ALERT value to '1'; enabling alerts.
# enable user e-mail alerts [0=disabled,1=enabled]
USR_ALERT="1"


And configure our e-mail addresses for alerts:
# e-mail address for alerts
USR_ADDR="root, you@domain.com"


Check the 5,10, or 15 minute load average; relative to the later option below for min. load level.
# check 5,10,15 minute load average. [1,2,3 respective of 5,10,15]
LC="1"


PRM optionally has a required load average for running. If the load is not equal to or greater than this value; PRM will not run. Setting this value to zero will force the script to always run but this should not be needed.
# min load level required to run (decimal values unsupported)
MIN_LOAD="1"


This is the introduction described wait value, used for pauses between trigger increments. The value of wait multiplied by the value of kill_trig equal the duration of time before a process is killed (10x3=30seconds).
# seconds to wait before rechecking a flagged pid (pid's noted resource
# intensive but not yet killed).
WAIT="10"


The trigger limit before processes are killed, described in detail in the above 'wait' description and introduction.
# counter limit that a process must reach prior to kill. The counter value
# increases for a process flagged resource intensive on rechecks.
KILL_TRIG="3"


The max percentage of CPU a process should be allowed to use before PRM flags it for killing.
# Max CPU usage readout for a process - % of all cpu resources (decimal values unsupported)
MAXCPU="35"


The max percentage of MEM a process should be allowed to use before PRM flags it for killing.
# Max MEM usage readout for a process - % of system total memory (decimal values unsupported)
MAXMEM="15"


That is it; you should tweak the MAXCPU/MAXMEM limits to your desired needs but the defaults should be fine for most.

Usage:
The executable program resides in '/usr/local/prm/prm' and '/usr/local/sbin/prm'. The prm executable can receive one of two arguments:

-s Standard run
-q Quiet run

The log path for prm is '/usr/local/prm/prm_log', as well pid specific logs are stored in '/usr/local/prm/killed/'.

A default cronjob for PRM is installed to '/etc/cron.d/prm', and is configured to run once every 5 minutes.

There is a provided ignore file, to ignore processes based on string rules. The ignore file is located at '/usr/local/prm/ignore'. This file supports line separated ignore strings. As a default the strings 'root, named and postgre' are ignored by PRM; this script was not intended to monitor root processes but rather user land tasks. It could easily watch root processes by removing the given line in the ignore file but this is strongly discouraged.
Nathan
Awesome.

Installing now...
Mike2522
Installing now, suggest adding 'mysql,vhbackup, tomcat4' to the ignor list. Thanks rfxn for the script icon_biggrin.gif , been looking for an alternative one other than the one i'm using.
Mike2522
BTW, any side effects if the script is run every 1min instead of 5min?
rfxn
I dont recommend running it more than at 2 minute incraments.
Netwerk
Very nice one! I would love to see this integrated into that other great invaluable of yours SIM, BTW.
UH-Matt
Amazing idea icon_smile.gif
svj
Thanks you! what a sweet and useful application!
Marty
For a cpanel server, make sure you add mysql and cpanel to the ignore list. You may have to add the various stats programs to the ingnore file so that it does not stop your stats runs.

Very nice script.
rfxn
PRM 0.3 has been released. This version introduces the ability to make discipline rules based on specific process name/command; allowing for granular memory/cpu limits for different processes (e.g: mysql, webalizer etc..)

Homepage: http://www.r-fx.net/prm.php
Download: http://www.r-fx.net/downloads/prm-current.tar.gz

0.3 Changes:
[Fix] ignore file exempting cmd values; fixed
[Change] default conf.prm values modified
[Change] default ignore values modified
[Change] exported e-mail alert contents to usr.msg
[New] process rule disciplines feature

*note* previous version of prm caused cpanel servers to get pid kills from analog and mysql often, this version has discipline rules presetup for this (no need to add to ignore file)
Technics
has this improved the performance of servers for anyone? im very intrested in using this but not much feedback on what it's actually like.
dynamicnet
Greetings:

Please consider the following:

The installation would check if a previous version is installed and preserve the configuration files (conf.prm, ignore, etc.).

Thank you.
rfxn
PRM 0.4 has been released, details below:

Homepage: http://www.r-fx.net/prm.php
Download: http://www.r-fx.net/downloads/prm-current.tar.gz

0.4:
[Change] log format changes; syslog style
[New] kernel logging feature
[New] get_state(), prevent multiple instances
[New] -j runtime option to toggle prm on or off
[Fix] various fixes to treatment of 'cmd' ps colum
[Change] alert output changes
[Change] load abort routine ignored from klog

The howto will be updated tomorrow for the new setup options and features.
foggy
gonna give this a try,

Thanks rfnx icon_smile.gif
Mike2522
Suggest adding time=; value for those processes that run more than a specified time. If its ok icon_smile.gif
REBIS
Thanks, rfxn. Another brilliant idea! Can't wait to try it. Btw, I'm sure I speak for many here when I say I don't know what I'd do w/o SIM. Much appreciated. Keep up the great work!
TheVoice
has anyone tried this yet?
huck
I am a little curious about the "trigger" mechanism. Could you please explain this a bit more.

For example, if a process is cyclical in its CPU usage does the script take that into account? What happens if the script looks at the process only during the high load periods? The average CPU/RAM utiltization could be low but peak times could be high; how is this situation handled?

Is there any state? Is resource utilization averaged over time or just a static snapshot of the process table at a given moment?

Caution to E-com sites.
Killing httpd threads on e-commerce sites can be dangerous. Orders can get lost. If you install any script that kills off processes, make sure that it is not killing legit items. I assume by adjusting the thresholds you can keep your card/payment scripts happy. Glad to see logging is included so you can track this.
GetWired
bump
tacoeater
I have this in my log:

Sep 18 14:52:13 alpha prm(28593): get_pinfo() value asignment error; aborting.

Does that mean program is not working?

Thanks
rfxn
CODE
get_pinfo() {

#

# colums

# user pid pcpu pmem cmd

# 1    2   3    4    5

....


Yes this is an internal error meaning that process information was not assigned to expected variables. What linux distro are you running PRM on; and any modifications to the ps binary or proc utils ?
rfxn
Huck:
PRM supports individual rule files were you can configure wait periods and recheck counts on the given process rule. So for example you could make PRM check httpd proceses; if a process is found over limits it can wait (sleep) for XX seconds and recheck the resource usage of the process; likewise this routine can be repeated XX times via another variable. If at any time during this routine the process is found to be below threshold limits in cpu/mem it then moves on to the next applicable rule set.

This allows you to make PRM follow a process for as long as you desire before finally determining if it is over resource limits or not.
tacoeater
Currently using Red hat 8.0 on that server

no mods.
Netino
Nice program. But I afraid the program is not working, I´m having the following logs, many, many times:
Sep 20 00:20:14 ensim prm(885): get_pinfo() value asignment error; aborting.

Sometimes:
Sep 19 14:40:06 ensim prm(19160): detected multiple prm processes; aborting.

No death prm in 'ps' processes. No prm in /etc/crontab. Only one instance invoked in /etc/cron.d/prm.

I install rigorously as instructions.

I´m using Ensim PRO 3.5.11, RedHat 7.3, kernel 2.4.21 from sources.

Regards,

Netino
Sebastian
some days ago I'm trying to access www.r-fx.net. It's down?
tobi76
hello rfxn,

great work, prm seems to work fine and it kick some huge sql queries before overload, nice!

But sometimes I get something like the following message by cron, don't know how to find the error. Better don't know what I have to do. Possible that you can manage that?

/usr/local/sbin/prm: line 173: 11746 Segmentation fault $INSPATH/$APPN -s >>/dev/null 2>&1

Any Suggestion? Thanks!

Using Version 0.4
rfxn
Download memtest86 and/or request rackshack test the memory in your system. Chances are the segfaults are stemming from bad memory blocks -- thats my opinion anyways.
acer2k
Site is down. Can't download the PRM icon_sad.gif
mtijssen
QUOTE
Originally posted by acer2k
Site is down. Can't download the PRM icon_sad.gif


Try This one:

http://www.rfxnetworks.com/prm.php

http://www.rfxnetworks.com/downloads/prm-c...-current.tar.gz
mulaton
Great utility!
rfxn
null
saver0
I have a question..

Will this kill the firewall if the load goes high :confused:
blaze64
No, because it will ignore processes owned by ?? root, or whever you specify. Or, you can ignore 'cpanel' for example.
saver0
what can i do so that it will ignore these?
blaze64
Just add it to the ignore list.
ricoche
Hi there,

I have installed this and it seems to be working pretty well. I wondered if it might be possible for any to share their ingore list so that I may determine whether I am on the right track or not.

I am running Cpanel with many additions such as tripwire, mailscanner, etc and am not sure what should be included in the ingnore list.

Thanks very much.

Edit - I suppose an alert would tell me if something important would be killed.
xilox
I have a problem when lauch process, this error:

system load (0) below check requirment; aborting

Have you a idea ?

Thank's

Vincent
rfxn
This is not a problem; PRM just doesnt run if load is not atleast 1.0 or greater.
xilox
i don't understand, sorry

how lauch command for process ?

vincent

sorry for my bad english
nunizgb
Hi, great script :-)

I have just one question :

When i get mail from prm it says that :

This is an automated status warning from secure.postnuke-france.net. The process (26778) has exceeded defined resource limits, as such a kill signal was invoked from the process resource monitor.

- Event Summary:
USER: apache
PID : 26778
CMD : /usr/sbin/httpd
CPU%: 0 (limit: 65)
MEM%: 1 (limit: 45)
PROCS: 53 (limit: 40)

Or i set up in conf file or rp that values :

# Max CPU usage readout for a process - % of all cpu resources (decimal values unsupported)
MAXCPU="60"

# Max MEM usage readout for a process - % of system total memory (decimal values unsupported)
MAXMEM="20"

# Max processes for a given command - this is not max processes for user but rather the executable
MAXPS="25"


So where can i change limit of this values :

CPU%: 0 (limit: 65)
MEM%: 1 (limit: 45)
PROCS: 53 (limit: 40)

TKS for reponse and sorry for my english and HAPPY NEAR YEAR (yes not yet in USA but here in EUROPE is soon (less then 4 hours)

PS : I found how to do that but what limit is better to give it ?
WebandNet
USER: nobody
PID : 29118
CMD : /usr/local/apache/bin/httpd
CPU%: 0 (limit: 65)
MEM%: 0 (limit: 45)
PROCS: 61 (limit: 40)

this process being killed causes apache & ssh to fail

Is this one user or multiple connections to many customer sites


What Should i put in the ignore list

httpd or /usr/local/apache/bin/httpd ?

Best wishes
WebandNet
jnarvaez
thank you rfxn for this cool program, i use too your SIM and APF, nice job!

But i think PRM is not working 100% ok for me, i had to set MIN_LOAD to 0, if I set to 1 it never starts. I think PRM can't get load average from my system, I recivied a pair of emails and it doesn't show CPU and MEM usage:

- Event Summary:
USER: #59926
PID : 9833
CMD : [ipop3d]
CPU%: 0 (limit: 40)
MEM%: 0 (limit: 20)
PROCS: 28 (limit: 25)

- Event Summary:
USER: #22050
PID : 27922
CMD : [ipop3d]
CPU%: 0 (limit: 40)
MEM%: 0 (limit: 20)
PROCS: 27 (limit: 25)

I'm using prm 0.5
btw, what do you think about these messages?? i'm going to set MAXPS to 40 to prevent this.

Thanks!
blaze64
All 3 above posts....

Your process were killed because of the PROCS limits you have set. For example... httpd, it is set for "40" on WebandNet's setup. Do you think there are more than 40...

Yes... add httpd to the ignore list. Add all services that you DO NOT want killed... cpanel, exim, mysql, etc.... Those are strictly what YOU need!!! Nobody can say "put this.. or put that....." There are some guidelines but each server is unique.
solokron
Excellent!

I am surprised httpd and mysql are not defaults.
koolnyze
I want to monitor sendmail with PRM so that it gets killed when sendmail exceeds the defined threshold but its owned by root and root is there in the ignore list. Can you please suggest a way around.
Ronny
MAXPS="25" in conf file however, in the email I'm getting:

PROCS: 62 (limit: 40)


It doesn't seem to change either when i raise the PS limit.

This is for apache btw, I always have +40 apache and prm is constantly killing it.

by the way if max PS is set to 25, does PRM kill processes until it's back down to 25, or does it kill all the processes?
anand
prm -s &

PRM version 0.5
Copyright © 1999-2003, R-fx Networks
Copyright © 2003, Ryan MacDonald
This program may be freely redistributed under the terms of the GNU GPL

Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.

This is what i get when i run prm from commandline.

Any help would be appreciated.
anand
after removing and doing a fresh install back.


./prm -s &
[1] 15210

PRM version 0.5
Copyright © 1999-2003, R-fx Networks
Copyright © 2003, Ryan MacDonald
This program may be freely redistributed under the terms of the GNU GPL

grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression

grep: Invalid regular expression
grep: Invalid regular expression
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression
grep: Invalid regular expression


:confused:
Erwin
It's about time I installed this beauty... icon_smile.gif I'll report on the outcome...
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.