Help - Search - Members - Calendar
Full Version: 20-40 second delay in switching over from failed server
The Planet Forums > System Administration > Load Balancing
terry99
I bring up a page, then stop apache on that server, then I hit refresh and there is a 20-40 second delay in switching over from failed server and I have to hit refresh again to bring up the page on a new server.

Why is there this long delay?

Does anyone else get simlar delay?
alex.davies
Each check occurs at 10 seconds intervals.

It takes 3 to declare a node failed, so the minumum will be 30 secs and max 39.

I personally wish it was less but we can't change the config.

Alex
terry99
Actually it looks like 4 nodes:
The load balancer performs layer 4 checks (opens a TCP connection to the service port) to determine if each service on both servers is "up". These checks are done every 10 seconds, and the service will be marked "down" if it fails 4 consecutive tests. The next time the server responds on the service port, it will be again be marked as available for requests.

So its really 40-49.999999 seconds. About a minute of down time. Why can't they do 1 or 2 nodes and every 4 seconds?
alex.davies
I think the max is 39 (.999) seconds.

The maximum time will be produced if
0.01 seconds after a test a node falls over
10 seconds later it fails its first check
10 second later it fails its second
10 seconds later it fails its third
10 seconds later it fails its forth and is removed from the pool

10+10+10+10-0.01 = 39.9

Similarly, the min is
0.01 seconds before a test a node falls over and fails its first test
10 second later it fails its second
10 seconds later it fails its third
10 seconds later it fails its forth and is removed from the pool

(10+10+10+0.01 = 30.01)

Yes, I agree with you it should be faster but it is to stop "false positives" caused by high loads when a server might reject a few requests.

Alex
terry99
What is the delay on a site like microsoft.com or newegg.com or google.com or pricepoint.com ?
alex.davies
QUOTE
Originally posted by terry99
What is the delay on a site like microsoft.com or newegg.com or google.com or pricepoint.com ?

I have no idea but I would imagine they remove servers that fail the health check within 10 seconds or so.

Alex
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.