Hi,

I have a loadbalanced cluster that is itself part of a cluster that is managed using zoneedit's failover solution.

Today coldfusion crashed on one of my EV1 servers. The first I knew about it was an alertra monitor (which checks each server individually). However I also got the following email from zoneedit:
CODE
Fri Dec 17 15:18:37 2004        [url]www.information-britain.co.uk[/url] - Partial primary site failure, disabling unresponsive ips, site returned 500

HTTP server failure

TESTED: 217.112.90.236,67.15.128.70

FAILED: 67.15.128.70

I know for a fact that the other server in the loadbalanced cluster did NOT fail at any stage.

I therefore think that it is appropraite to post here so that anyone considering purchasing one of these realises that they do not remove all dead servers - so far I have had two of these "coldfusion crashes" but no other problem.

I was also wondering if anyone has already written a script to actually kill apache (properly) if it fails to respond. If not I will write a quick script to hit a certain page, look for a certain piece of text and if it does not find it issue a lot of "killall httpd" commands and email the server admin.

Alex