Help - Search - Members - Calendar
Full Version: HowTo: Watchdog - Auto-Reboot your server in case of failures
The Planet Forums > System Administration > HOWTOs
Pages: 1, 2
Netino
Watchdog HowTo (edited: 15-July-2008)
==============
Keywords: software autoreboot, autorebooting, auto-reboot, auto-rebooting, auto rebooting

Watchdog is a program that you can use to reboot your server automatically in a lot of cases.
It has been used succesfully to reboot servers in the "Unexplained Crash" problem, that can have as causes a disk queue starvation problem, or a quota/ext3 filesystem deadlock, crashing the server many times randomly. If downtime due crashes in your system is a problem, probably you must use watchdog to assure you peacefully tranquility back again.

This works in any distribution: Ensim, Plesk, CPanel, etc., in any Linux system.

As documentation in /usr/src/[your-linux-kernel]/Documentation/watchdog.txt, kernel provides watchdog timer interfaces in a device named /dev/watchdog, "which when open must be written to within a timeout or the machine will reboot. Each write delays the reboot time another timeout. In the case of the software watchdog the ability to reboot will depend on the state of the machines and interrupts. The hardware boards physically pull the machine down off their own onboard timers and will reboot from almost anything.". The timeout default is 60 seconds.

The watchdog program simply uses the /dev/watchdog device, activating the softdog module on your system, if you have support in your kernel, and writes in /dev/watchdog within 10 seconds, making several other (configurable) checks in your system. If your system crashes, or watchdog stop to working, or in any case watchdog be supposed not to write in that device in 60 seconds, but kernel remains live, it will reboot within 60 seconds.

I have acknowledgement the following RedHat kernel already comes with support to softdog module:
2.4.18-27.7.x
2.4.20-19.7
2.4.20-24.7
2.4.20-27.7
2.4.20-28.7
2.4.21-27.ELsmp (RHEL3)

The major distros already comes with softdog module support. If you donīt use any of above kernels, try to check if your version/distro come with softdog module suport, with the command "modprobe softdog", and check with "lsmod|grep softdog". If so, quickly execute "rmmod softdog", to your server not reboot automaticly. If not supported, you must compile a kernel with support for watchdog, setting these parameters:
CONFIG_WATCHDOG=y
CONFIG_SOFT_WATCHDOG=m

Refer to "Kernel compile HowTo" to compile a new kernel for your system.

Installation
============
In general steps, to install watchdog itīs suffice download, install, and change a few parameters in /etc/watchdog.conf. Itīs very simple. But in *NO* way experiment with watchdog !!! You can have a bad experience, and need to restore your server. Only do what you know what you are doing! Be advised. Iīm a experienced network administrator (20 years IT, 11 years with hosting), and although my experience, this costed me 2 (two) restores with EV1 to learn.

Always check your backups *before* install watchdog.

Download:
=========
If you are using Ensim, download from:http://rpm.pbone.net/
# wget ftp://ftp.pbone.net/mirror/dag.wieers.com...g.rh73.i386.rpm (Several other different versions in dag repository)

Run your rpm:
# rpm -ivh watchdog-5.2-5.dag.rh73.i386.rpm

FIRST IMPORTANT THING TO DO: Disable auto-start of watchdog (explained below the reason):
(This is for RedHat like distros. Check how to do it for another distros)
# chkconfig watchdog off

Configuration:
==============
Softdog is auto-loaded by watchdog, so you donīt need make nothing.
You need at least to change the /etc/watchdog.conf, in the following lines, uncomenting its:
Uncomment:
CODE
=================================
#file = /var/log/messages
#watchdog-device = /dev/watchdog
=================================


Turning in:
CODE
=================================
file = /var/log/messages
watchdog-device = /dev/watchdog
=================================


You can adjust any other configurations at your taste. Check too the file '/etc/sysconfig/watchdog' (for RedHat-like distros) to startup / command line configurations of watchdog. (for example, mine is: OPTIONS="-v -b" to verbose log, and soft reboot)

Create the watchdog device:
# mknod /dev/watchdog c 10 130

Check if it exists really:
# ls -alF /dev/watchdog

If ok, execute the following:
# service watchdog start

You already have watchdog working.

Check in your /var/log/messages if there are some lines like the following:
CODE
Jan 13 15:06:13 ensim kernel: Software Watchdog Timer: 0.05,
timer margin: 60 sec
Jan 13 15:06:13 ensim kernel: pcwd: v1.13 (03/06/2002) Ken Hollis
(kenji@bitgate.com)
Jan 13 15:06:13 ensim kernel: pcwd: No card detected, or port not
available
Jan 13 15:06:13 ensim kernel: WDT driver for Acquire single board
computer initialising.
Jan 13 15:06:13 ensim watchdog: watchdog startup succeeded
Jan 13 15:06:13 ensim watchdog[3130]: starting daemon (5.1): (...
long line with options...)


If so, itīs all right.
After, test watchdog, rebooting your server:
# service watchdog stop
(NOTICE 1: This is not a truely shutdown/reboot procedure! The kernel will make a hard reboot here. So, analyse the consequences, if you do not have any program writting in your disk. Close all processes first, if you have worries about, shutdowning all daemons before. Is kernel rebooting your machine, not watchdog daemon program.)

In 60 seconds your system will reboot. If not, try to check if the module are loaded, with the "lsmod" command. In more modern systems like some newer version of RedHat-like distro, the "service" command to "stop" or "restart" does nothing. This is much more secure to work with watchdog. If so, try to reboot manually your server. Your system should restart in normal time (nearly two minutes. Pray!)

If your system itīs ok, restart watchdog again (service watchdog restart), and you could include a line in the end of file /etc/rc.d/rc.local:
# echo "/sbin/service watchdog restart" >> /etc/rc.d/rc.local

If you to want, test again watchdog:
# service watchdog stop

If reboot ok, you are already protected.
If not reboot, ask for EV1 reboot in single user mode, or a different kernel, and undo yourself the changes. The main is to remove the device /dev/watchdog, with "rm -f /dev/watchdog" command. When the computer starts, and see no /dev/watchdog, the softdog do nothing, and your server stops to rebooting in next boot.

!!! CAUTION !!! CAUTION !!! CAUTION !!!
1) Never, never, never use chkconfig to make watchdog auto-restart in next boot. Redhat kill processes when changing runlevels, and when kill watchdog your system will eternally rebooting, needing a restore from EV1. Donīt experiment with watchdog in a real production machine.
2) Following rigorously the steps above worked for me, and I think can work for your. But I cannot warranty any thing to you, so you are the ultimate responsible to following them.
!!! CAUTION !!! CAUTION !!! CAUTION !!!

That steps above are secure. But before you install new kernels NEVER forget to drop the line in your /etc/rc.d/rc.local file (comment it). Test watchdog again in the new system, like showed before, but without start automatically in the next boot, commenting the start line of watchdog in rc.local. If any problem, simply ask a reboot to EV1, and all itīs ok again, allowing you know what fails, if kernel not support watchdog, if installation problem, etc.

Some succesful and unsuccesful (me) installations of watchdog in:
<http://forum.ev1servers.net/showthread.php...?threadid=33908>

I donīt understand why EV1/ThePlanet not offers default servers with watchdog/softdog installed default. Would save enourmous time of their staff.

Enjoy it.

Regards,
Netino
Netino
[edit] Sorry, deleted wrong double post. Please moderator, delete this post, I cannot.[/edit]
daveman692
What is the difference between watchdog and softdog?
Netino
QUOTE
Originally posted by daveman692
What is the difference between watchdog and softdog?


'watchdog' is a program.
'softdog' is a kernel module.
'/dev/watchdog' is the timer device used by kernel.

The watchdog program uses the softdog kernel module, writing the /dev/watchdog device, after several checks in your system. If someone goes wrong, include with watchdog itself, your server is rebooted by kernel itself.

Regards,
Netino
daveman692
Ahh ok, so installing softdog isn't really what I want to do. Rather uninstall it and install watchdog instead. Thanks.
Dilder
Hi,

Thanks for the howto.
Can I use this howto without having to recompile my kernel?

My kernel version is 2.4.18-18.7.x

Im not sure on how to check the current configuration, but
grep WATCHDOG /boot/config-2.4.18-18.7.x

Gives:

# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_SOFT_WATCHDOG=m
CONFIG_PCWATCHDOG=m
CONFIG_WATCHDOG=y
Doobla
QUOTE
Originally posted by daveman692
Ahh ok, so installing softdog isn't really what I want to do.  Rather uninstall it and install watchdog instead.  Thanks.


Actually, it really depends on where you found "softdog". The watchdog daemon I found and used on my old EV1 box was labeled softdog. I've actually never used Netino's source so I think I'll give it a try.

Jon
Netino
QUOTE
Originally posted by Dilder
Hi,

Thanks for the howto.
Can I use this howto without having to recompile my kernel?

My kernel version is 2.4.18-18.7.x

Im not sure on how to check the current configuration, but
grep WATCHDOG /boot/config-2.4.18-18.7.x

Gives:

# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_SOFT_WATCHDOG=m
CONFIG_PCWATCHDOG=m
CONFIG_WATCHDOG=y


Yes, seems you have all ok.
Install watchdog via rpm, and change your configuration in /etc/watchdog.conf as howto, and issue the command:
# service watchdog start

Then, check your modules:
# lsmod

If you see 'softdog' between them, you are ok, ready to test it, following the howto.

Regards,
Netino
Doobla
hmm, I don't understand this. I installed watchdog per the howto. I created the watchdog timer device. I started the service. I did an lsmod and saw softog used count as 1. I stopped watchdog. lsmod again shows softdog used by zero. System never reboots.

Any ideas?

Jon


I found the following on the net which leads me to believe that what I am experiencing is by design. So how can I reliably test watchdog to see if it will reboot the system?
QUOTE
At start-up, the daemon opens the watchdog device (a character device with major number 10 and minor number 130) starting the watchdog process--an infinite loop in which the daemon alternately writes to the watchdog device to refresh the timer, then sleeps for 10 seconds. If the daemon is killed, the device file is closed, and the timer is disabled again.  
http://www.linuxjournal.com/article.php?sid=0217

Netino
What are showing your /var/log/messages file?

Netino
Doobla
QUOTE
Originally posted by Netino
What are showing your /var/log/messages file?

Netino


Thanks for the reply. I'm baffled.
QUOTE
Feb 28 13:29:17 secure watchdog[27541]: starting daemon (5.1): int=10s realtime=yes sync=no soft=no mla=0 mem=0 ping=none file=/var/log/messages:0 pidfile=none iface=none test=none repair=/usr/sbin/repair alive=/dev/watchdog temp=none to=root no_act=no
Feb 28 13:29:17 secure watchdog: watchdog startup succeeded


Jon
Netino
QUOTE
Originally posted by Doobla


hmm, I don't understand this.  I installed watchdog per the howto.  I created the watchdog timer device.  I started the service.  I did an lsmod and saw softog used count as 1.  I stopped watchdog.  lsmod again shows softdog used by zero.  System never reboots.

Any ideas?

Jon


I found the following on the net which leads me to believe that what I am experiencing is by design.  So how can I reliably test watchdog to see if it will reboot the system?



I think that article is a bit outdated (1997), so we cannot assure still valid this assumption. When I stop watchdog, my server restarts normally.

Regards,
Netino
Netino
What are showing your /var/log/messages file when you stops watchdog daemon?

Regards,
Netino
Doobla
QUOTE
Originally posted by Netino
What are showing your /var/log/messages file when you stops watchdog daemon?

Regards,
Netino


Normal stuff
QUOTE
Feb 28 17:17:28 secure watchdog[27541]: stopping daemon (5.1)
Feb 28 17:17:32 secure watchdog: watchdog shutdown succeeded
alex.davies
Installed this using the above (ensimized) RPM on a Red Hat Enterprise box (Dual Xeon 2GHz) no problems. Restarted fine.

Thanks for the great HOWTO!

Alex Davies
Doobla
QUOTE
Originally posted by alex.davies
Installed this using the above (ensimized) RPM on a Red Hat Enterprise box (Dual Xeon 2GHz) no problems. Restarted fine.

Thanks for the great HOWTO!

Alex Davies


Why do you say the rpm is ensimized? Ensim does nothing to or with the kernel.

BTW: I'm on Fedora Core 1 with the latest kernel an so when I downloaded and installed the rpm I grabbed the Fedora rpm.

Jon
Netino
QUOTE
Originally posted by Doobla

Normal stuff

Feb 28 17:17:28 secure watchdog[27541]: stopping daemon (5.1)
Feb 28 17:17:32 secure watchdog: watchdog shutdown succeeded  


Doobla,

Your system should the following in logs:
CODE
Feb 28 17:17:28 ensim watchdog[3357]: stopping daemon (5.1)

Feb 28 17:17:28 ensim kernel: SOFTDOG: WDT device closed

unexpectedly.  WDT will not stop!

Feb 28 17:17:32 ensim watchdog: watchdog shutdown succeeded

Is possible you have some problem on loading the softdog module with watchdog.

You have the device /dev/watchdog created?

Regards,
Netino
Doobla
QUOTE
Originally posted by Netino
Doobla,

Your system should the following in logs:
CODE
Feb 28 17:17:28 ensim watchdog[3357]: stopping daemon (5.1)

Feb 28 17:17:28 ensim kernel: SOFTDOG: WDT device closed unexpectedly.  WDT will not stop!

Feb 28 17:17:32 ensim watchdog: watchdog shutdown succeeded

Is possible you have some problem on loading the softdog module with watchdog.  

You have the device /dev/watchdog created?

Regards,
Netino


Yes I do have /dev/watchdog created. Can you post an output of lsmod before starting watchdog and afterwards? I just basically need to see hwo mine is different. Before I start watchdog lsmod doesn't show softdog loaded but after I start it it shows it loaded and used by 1. When I stop watchdog it is still loaded and used by zero.

If I can't get this to work then I gues I'll fall back to the daemon I used to use called softdog....but I'd rather use this if I can get it to work.

Jon
Netino
QUOTE
Originally posted by Doobla
Yes I do have /dev/watchdog created.  Can you post an output of lsmod before starting watchdog and afterwards?  I just basically need to see hwo mine is different.  Before I start watchdog lsmod doesn't show softdog loaded but after I start it it shows it loaded and used by 1.  When I stop watchdog it is still loaded and used by zero.

If I can't get this to work then I gues I'll fall back to the daemon I used to use called softdog....but I'd rather use this if I can get it to work.

Jon


I changed my kernel yesterday, and reboot my server. The 'lsmod' command before loads watchdog not shows softdog, nor used by 0, that makes sense, because watchdog still not started.

After start watchdog, is showing "softdog" used by 1, accordingly.

I would suggest you make a "clean" reboot, because could be there a conflict with your actual softdog module, not load it in next boot, and test it with watchdog program. Try to use "modprobe softdog" to check any error messages.

Note, the reboot work is not made by anyone daemon, but kernel itself.

Regards,
Netino
Doobla
QUOTE
Originally posted by Netino
I changed my kernel yesterday, and reboot my server. The 'lsmod' command before loads watchdog not shows softdog, nor used by 0, that makes sense, because watchdog still not started.

After start watchdog, is showing "softdog" used by 1, accordingly.

I would suggest you make a "clean" reboot, because could be there a conflict with your actual softdog module, not load it in next boot, and test it with watchdog program. Try to use "modprobe softdog" to check any error messages.

Note, the reboot work is not made by anyone daemon, but kernel itself.

Regards,
Netino


ok, I edited the watchdog service file and found that it was using modprobe already but the output was being redirected to /dev/null so I commented that out and here is the output from the commands when starting watchdog....
QUOTE
[root@secure root]# service watchdog start
Starting Software Watchdog daemon (watchdog): /lib/modules/2.4.22-1.2174.nptl/kernel/drivers/char/pcwd.o: init_module: Input/output error
Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.
     You may find more information in syslog or the output from dmesg
/lib/modules/2.4.22-1.2174.nptl/kernel/drivers/char/pcwd.o: insmod /lib/modules/2.4.22-1.2174.nptl/kernel/drivers/char/pcwd.o failed
/lib/modules/2.4.22-1.2174.nptl/kernel/drivers/char/pcwd.o: insmod pcwd failed
/lib/modules/2.4.22-1.2174.nptl/unsupported/drivers/char/acquirewdt.o: init_module: No such device
Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.
     You may find more information in syslog or the output from dmesg
/lib/modules/2.4.22-1.2174.nptl/unsupported/drivers/char/acquirewdt.o: insmod /lib/modules/2.4.22-1.2174.nptl/unsupported/drivers/char/acquirewdt.o failed
/lib/modules/2.4.22-1.2174.nptl/unsupported/drivers/char/acquirewdt.o: insmod acquirewdt failed


the commands that generated those errors are listed below with the /dev/null redirects commented out.
QUOTE
       modprobe softdog #&>/dev/null
       modprobe pcwd #&>/dev/null
       modprobe acquirewdt #&>/dev/null


Any ideas? Thanks very much Netino,

Jon
Netino
QUOTE
Originally posted by Doobla
ok, I edited the watchdog service file and found that it was using modprobe already but the output was being redirected to /dev/null so I commented that out and here is the output from the commands when starting watchdog....


the commands that generated those errors are listed below with the /dev/null redirects commented out.


Any ideas?  Thanks very much Netino,

Jon


Seems there are no problem, these modules are some bad documented, but only for sure you could comment 'pcwd' and 'acquirewdt', to load watchdog. But before try to use "modprobe softdog" on shell command line, and see if any error messages.

Regards,
Netino
ccmheoa
i folllowed the instructions, and when i did a service watchdog stop..my server seems to reboot, however it doesn't come backup....i need to press the reset button to make it reset properly...anybody experienced this?
Netino
QUOTE
Originally posted by ccmheoa
i folllowed the instructions, and when i did a service watchdog stop..my server seems to reboot, however it doesn't come backup....i need to press the reset button to make it reset properly...anybody experienced this?


I already. But I included watchdog to auto-restart in next boot.
You change some configuration file to auto-start watchdog?

Regards,
Netino
cmafia
Sorry but SIM does the same thing......and it also restarts the individual service that goes down. MUCH BETTER THAN THIS PROGRAM!! TRUST ME ON THIS ONE! All of us old dawgs are using it. Do a search for SIM and check it out!
Netino
QUOTE
Originally posted by cmafia
Sorry but SIM does the same thing......and it also restarts the individual service that goes down.  MUCH BETTER THAN THIS PROGRAM!!  TRUST ME ON THIS ONE!  All of us old dawgs are using it.  Do a search for SIM and check it out!


Guy, you donīt know what you are speaking, no one word, believe me.

All of us are using SIM, yes it is worderfull, great job from rfxn, but unfortunately it simply not solve the crash problem.
It serves to restart services, and reboot the server only when possible, but when enters in "D" state (uninterruptible sleep), it freezes, altogether all programs in your server, and cannot access anymore the binary 'shutdown' to reboot the server. Read the references about just given.

We are not speaking simply about a program, but a kernel module. This was developed by Alan Cox, a pope from Linux, and have no substitutes.

Regards,
Netino
ccmheoa
QUOTE
Originally posted by Netino
I already. But I included watchdog to auto-restart in next boot.
You change some configuration file to auto-start watchdog?

Regards,
Netino



i have not made any configuration to auto start watchdog.
basically i just installed the rpm ,followed your instructions in the making the /dev/watchdog file and did a service watchdog start.

to test it out i did a service watchdog stop. after which the server stopped responding......resulting in a hard reset required!
Netino
QUOTE
Originally posted by ccmheoa
i have not made any configuration to auto start watchdog.
basically i just installed the rpm ,followed your instructions in the making the /dev/watchdog file and did a service watchdog start.

to test it out i did a service watchdog stop. after which the server stopped responding......resulting in a hard reset required!


As watchdog is not being auto-restated, this is not a watchdog related problem, probably you have a BIOS problem to reboot your server.

This is commom with Compaq servers: you try to do a soft reboot, and it not restart again.

Regards,
Netino
Bucanero
Hi

I Install sucesfully your how to in 4 of my 6 linux servers, but have some questions for the other 2.

1. I have a Duron Server, seems like the rpm does not work here, after installing it and do a lsmod softdog does not appear there. Any clues ?

2. I have a Celeron with Fedora, I download the Fedora RPM and install it, everything went smoth, but when I test (stop watchdog) nothing happends

When Watchdog is stopped this is what lsmod says:

Module Size Used by Not tainted
softdog 2748 0
autofs 12084 0 (autoclean) (unused)
8139too 16200 1
mii 3992 0 [8139too]
floppy 57308 0 (autoclean)
sg 35436 0 (autoclean) (unused)
scsi_mod 110280 1 (autoclean) [sg]
microcode 4188 0 (autoclean)
ext3 71716 2
jbd 51276 2 [ext3]

When watchdog is on, this is what lsmod says
softdog 2748 1
autofs 12084 0 (autoclean) (unused)
8139too 16200 1
mii 3992 0 [8139too]
floppy 57308 0 (autoclean)
sg 35436 0 (autoclean) (unused)
scsi_mod 110280 1 (autoclean) [sg]
microcode 4188 0 (autoclean)
ext3 71716 2
jbd 51276 2 [ext3]

Also, the var/log/messages says

Mar 31 10:40:21 arauca kernel: pcwd: v1.13 (03/06/2002) Ken Hollis (kenji@bitgate.com)
Mar 31 10:40:21 arauca kernel: pcwd: No card detected, or port not available
Mar 31 10:40:21 arauca kernel: WDT driver for Acquire single board computer initialising.
Mar 31 10:40:21 arauca watchdog[1367]: starting daemon (5.1): int=10s realtime=yes sync=no soft=no mla=0 mem=0 ping=none file=/var/log/messages:0 pidfile=none iface=none test=none repair=/usr/sbin/repair alive=/dev/watchdog temp=none to=root no_act=no
Mar 31 10:40:21 arauca watchdog: watchdog startup succeeded

Any clues on what to check ?

Thanks

Bucanero
Netino
QUOTE
Originally posted by Bucanero
Hi

I Install sucesfully your how to in 4 of my 6 linux servers, but have some questions for the other 2.

1. I have a Duron Server, seems like the rpm does not work here, after installing it and do a lsmod softdog does not appear there. Any clues ?


This is strange, because I have a Duron too, and Iīm using exactly that rpm, and here is working normally.

QUOTE
 
2. I have a Celeron with Fedora, I download the Fedora RPM and install it, everything went smoth, but when I test (stop watchdog) nothing happends


Possibly your kernel(s) not have support to watchdog. You already check for this? If not compiled in support to watchdog, you must or use a supported kernel, ready to use, or compile yourself a kernel to use it.

Regards,
Netino
Doobla
QUOTE
Originally posted by Netino

Possibly your kernel(s) not have support to watchdog. You already check for this? If not compiled in support to watchdog, you must or use a supported kernel, ready to use, or compile yourself a kernel to use it.

Regards,
Netino


I'm having the same problem he is on Fedora. So are you saying that if watchdog isn't compiled in it won't work as a module? On my RedHat 7.3 I was able to get watchdog working using a kernel module. Please clarify.

thanks,

Jon
Netino
QUOTE
Originally posted by Doobla
I'm having the same problem he is on Fedora.  So are you saying that if watchdog isn't compiled in it won't work as a module?  On my RedHat 7.3 I was able to get watchdog working using a kernel module.  Please clarify.

thanks,

Jon


Watchdog, in any case, *must* be supported by kernel.
You have two options:
1)Build "Software Watchdog" as module supported in kernel;
2)Build "Software Watchdog" built in kernel;

Your watchdog was working because you was using a rpm RedHat kernel already built with support to watchdog in kernel (as module).

Now, if you are using a kernel not compiled with watchdog support (nor module, nor built in), your program 'watchdog' will not work.

You must compile a kernel with support built in, in the following sequence: (DO THIS TO FOLLOW OUR HOWTO)
Character devices --->
Watchdog Cards --->
[*] Watchdog Timer Support
Software Watchdog

This will built your kernel with "Software Watchdog" as module support, that will be needed to loaded by a program, as our rpm "watchdog" do.

The other possibility is build your kernel with "Software Watchdog" built in kernel, with: (DONīT DO THIS, IF YOU ARE FOLLOWING HOWTO)
Character devices --->
Watchdog Cards --->
[*] Watchdog Timer Support
<*> Software Watchdog

This could built your kernel with "Software Watchdog" built in kernel, not needing to be loaded at run time, as our rpm "watchdog" do. This still work with watchdog, but you will need to disable the module loading in watchdog start up script. But donīt do it: i recommend donīt experiment with watchdog in a production machine, the consequences can be severe.

Regards,
Netino
Doobla
Yes, I understand that watchdog support needs to be in the kernel, I was saying that I installed it as a module and couldn't get it to work. I would stop the watchdog daemon and no reboot would occur.

Jon
Netino
QUOTE
Originally posted by Doobla
Yes, I understand that watchdog support needs to be in the kernel, I was saying that I installed it as a module and couldn't get it to work.  I would stop the watchdog daemon and no reboot would occur.

Jon


You could have the module (softdog.o), but possibly not have the support on kernel.
You must compile a kernel to support that module.

Regards,
Netino
Doobla
QUOTE
Originally posted by Netino
You could have the module (softdog.o), but possibly not have the support on kernel.
You must compile a kernel to support that module.

Regards,
Netino


I didn't realize that. I was under the impression that if you installed the module that the kernel then supported it. Good to know!

Thanks,

Jon
Dave
QUOTE
Originally posted by Netino
If so, itīs all right.
After, test watchdog, rebooting your server:
# service watchdog stop


are you sure that's already time to "test" (reboot)?

AFAIK, after the /dev/watchdog file is created, it must be write to once a minute... if you turn the service off and restart without putting it on the init.d, it won't start after the reboot... and will be rebooting forever

-- Dave
awww
Hello,

I think watchdog would solve part of my problems, however I have some questions about it:

1) The kernel exspects a write on /dev/watchdog at least every 60 seconds otherwise will reboot, normal writes happens every 10 seconds, is there a way to raise both these walue? I mean configure watchdog to write on device every 60 seconds and reboot if write cannot be done in 2 minutes?

2) Let's say you usually start several processes at boot wich take time to load, if you put watchdog start line at the end of rc.local file don't you risk the processes you start before watchdog itself will take more than 60 seconds to load and your system reboot forever?
When watchdog starts "monitoring" your system?

3) I read installing watchdog is dangerous because you risk an endless reboot.
This is ok, as far as I have understood if watchdog writer process get killed at boot (for some reason) you will have to request for a restore.
However if you configure watchdog to reboot only after let's say 5 minutes of unsuccessfull writes to /dev/watchdog wouldn't you have the chance to log in and "DO SOMETHING" to break the endless loop of reboots?

4) Uninstallation: since no modifications are made to the kernel if you want to uninstall watchdog would you need simply ti remove the rpm?

5) System resources: how many system resources watchdog uses?



Thanks to whoever wants to answer to my questions and thanks Netino for your howto.
awww
Hello,

I think watchdog would solve part of my problems, however I have some questions about it:

1) The kernel exspects a write on /dev/watchdog at least every 60 seconds otherwise will reboot, normal writes happens every 10 seconds, is there a way to raise both these walue? I mean configure watchdog to write on device every 60 seconds and reboot if write cannot be done in 2 minutes?

2) Let's say you usually start several processes at boot wich take time to load, if you put watchdog start line at the end of rc.local file don't you risk the processes you start before watchdog itself will take more than 60 seconds to load and your system reboot forever?
When watchdog starts "monitoring" your system?

3) I read installing watchdog is dangerous because you risk an endless reboot.
This is ok, as far as I have understood if watchdog writer process get killed at boot (for some reason) you will have to request for a restore.
However if you configure watchdog to reboot only after let's say 5 minutes of unsuccessfull writes to /dev/watchdog wouldn't you have the chance to log in and "DO SOMETHING" to break the endless loop of reboots?

4) Uninstallation: since no modifications are made to the kernel if you want to uninstall watchdog would you need simply ti remove the rpm?

5) System resources: how many system resources watchdog uses?



Thanks to whoever wants to answer to my questions and thanks Netino for your howto.
awww
Hello,

I think watchdog would solve part of my problems, however I have some questions about it:

1) The kernel exspects a write on /dev/watchdog at least every 60 seconds otherwise will reboot, normal writes happens every 10 seconds, is there a way to raise both these walue? I mean configure watchdog to write on device every 60 seconds and reboot if write cannot be done in 2 minutes?

2) Let's say you usually start several processes at boot wich take time to load, if you put watchdog start line at the end of rc.local file don't you risk the processes you start before watchdog itself will take more than 60 seconds to load and your system reboot forever?
When watchdog starts "monitoring" your system?

3) I read installing watchdog is dangerous because you risk an endless reboot.
This is ok, as far as I have understood if watchdog writer process get killed at boot (for some reason) you will have to request for a restore.
However if you configure watchdog to reboot only after let's say 5 minutes of unsuccessfull writes to /dev/watchdog wouldn't you have the chance to log in and "DO SOMETHING" to break the endless loop of reboots?

4) Uninstallation: since no modifications are made to the kernel if you want to uninstall watchdog would you need simply ti remove the rpm?

5) System resources: how many system resources watchdog uses?



Thanks to whoever wants to answer to my questions and thanks Netino for your howto.
Poof
Anybody able to help here? Using Fudora... When I do lsmod after I start watchdog I see it as 1. With this in my log/messages

-
CODE
Jul 25 22:36:23 goingmerry kernel: pcwd: v1.13 (03/06/2002) Ken Hollis (kenji@bitgate.com)

Jul 25 22:36:23 goingmerry kernel: pcwd: No card detected, or port not available

Jul 25 22:36:23 goingmerry kernel: WDT driver for Acquire single board computer initialising.

Jul 25 22:36:23 goingmerry watchdog[3835]: starting daemon (5.1): int=10s realtime=yes sync=no soft=no mla=0 mem=0 ping=none file=/var/log/messages:0 pidfile=none iface=none test=none repair=/usr/sbin/repair alive=/dev/watchdog temp=none to=root no_act=no

Jul 25 22:36:23 goingmerry watchdog: watchdog startup succeeded


Then when I shut it down it just does "Jul 25 22:33:18 goingmerry watchdog: watchdog shutdown succeeded"

The system never restarts. (BTW. It gives a value of 0 on lsmod to softdog when I look after I shut it down.)

Further.
CODE
grep WATCHDOG /boot/config-2.4.22-1.2197.nptl

# CONFIG_WATCHDOG_NOWAYOUT is not set

CONFIG_SOFT_WATCHDOG=m

CONFIG_PCWATCHDOG=m

CONFIG_IPMI_WATCHDOG=m

CONFIG_WATCHDOG=y


Anyhow... In either way... It's just not rebooting at all. I have the /dev/watchdog made. And modified the watchdog.conf for those two lines. (file and watchdog-device)

Any ideas? =/
Netino
Fedora is not ready to use with watchdog.
You will need to re-compile your kernel, with support to /dev/watchdog device.

Regards,
Netino
kiwirob
Did anybody have any luck getting watchdog to work on Fedora?

I have followed all the instructions here and compiled a custom kernel with Watchdog Timer Support and softdog module, created /dev/watchdog etc but no luck.

Netino you said on your last post the the kernel needs to be re-compiled to support /dev/watchdog. Is this something else in addition to the kernel configurations you have already listed?

If anybody can help here I'd be very appreciative.
Netino
QUOTE
Originally posted by kiwirob
Did anybody have any luck getting watchdog to work on Fedora?

I have followed all the instructions here and compiled a custom kernel with Watchdog Timer Support  and softdog module, created /dev/watchdog etc but no luck.

Netino  you said on your last post the the kernel needs to be re-compiled to support /dev/watchdog.  Is this something else in addition to the kernel configurations you have already listed?

If anybody can help here I'd be very appreciative.


No, this pré-condition was already listed: the kernel *must* have support to /dev/watchdog (by module or builtin).

If you can, download a kernel from kernel.org, compile and use it.

Regards,
Netino
ab70
QUOTE
# echo "/sbin/service watchdog restart" >> /etc/rc.d/rc.local


To avoid problems with reboot-loops or with new kernels may we don't add this line to rc.local ?

Naturally after every reboot (requested by hands or by watchdog) we had to restart watchdog manually.

Is it possible ?

Best regards
Netino
QUOTE
Originally posted by ab70
To avoid problems with reboot-loops or with new kernels may we don't add this line to rc.local ?

Naturally after every reboot (requested by hands or by watchdog) we had to restart watchdog manually.

Is it possible ?

Best regards


Yes, surely no problem.

Regards,
Netino
webwizard
Before I try my hand at installing, I want to make sure I know how to disable if desired.

If this is how you test it:

QUOTE
Originally posted by Netino
Watchdog HowTo
==============
If you to want, test again watchdog:
# service watchdog stop

Then how would you temporarily disable the service?
Netino
QUOTE
Originally posted by webwizard
Before I try my hand at installing, I want to make sure I know how to disable if desired.

If this is how you test it:
(...)
 
Then how would you temporarily disable the service?


Once enabled, watchdog will not stop until reboot.
If you follow the howto, watchdog will enable in next boot *only* if you include in your rc.local to restart it automatically.

Regards,
Netino
webwizard
QUOTE
Originally posted by Netino
Once enabled, watchdog will not stop until reboot.
If you follow the howto, watchdog will enable in next boot *only* if you include in your rc.local to restart it automatically.


Thank you for your response, that clears things up a bit. That also explains how loops are avoiding if the server takes longer than 60 seconds to load everything up during a reboot.

Thanks again icon_smile.gif
webwizard
I finally got around to installing this on my Fedora, however looks like I'm having the same problem as Doobla and others on Fedora.

I have kernel vs. 2.4.22, can someone confirm that it's not compiled for Watchdog? Assuming that is the case, any chance of someone offering a step-by-step on how to compile in support for Watchdog?
Doobla
QUOTE (webwizard)
I finally got around to installing this on my Fedora, however looks like I'm having the same problem as Doobla and others on Fedora.

I have kernel vs. 2.4.22, can someone confirm that it's not compiled for Watchdog? Assuming that is the case, any chance of someone offering a step-by-step on how to compile in support for Watchdog?

I just got back from vacation. This is on my TODO to get worked out and I'll post my results here. Should be very easy really.
webwizard
QUOTE (Doobla)
I just got back from vacation.  This is on my TODO to get worked out and I'll post my results here.  Should be very easy really.


Doobla,

Any success yet? These lockups are such a pain in the butt!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.