Help - Search - Members - Calendar
Full Version: Load balancer for failover/redundancy? No.
The Planet Forums > System Administration > Load Balancing
nacs
I purchased a load balancing solution from Servermatrix 2 days ago for a high traffic site that I wanted to have some redundancy and failover protection for.

I figured I would have the load balancer distributing the incoming webpage requests over two dedicated Apache servers (each of which can handle the full traffic by itself but I wanted to have the failover protection and redundancy that SM's load balancing service claimed to offer).

After setting it up, all was good for one night. In the morning however, the load balancer suddenly stopped serving requests. It came back a couple minutes later after I quickly logged into Orbit and changed the load balancer config around (found that the interface for the load balancer had been completely redesigned since I'd seen it just the night before and I think the interface change made by SM broke something). I wasn't too worried at that time since it was only down for a couple minutes and figured it was just some freak occurence.

Well today, 50 minutes ago, the same thing happened. Site was down. Load balancer suddenly stopped serving requests and was unpingable. I am also unable to access the load balancer config page through Orbit. I discovered the outage on the server 5 minutes after the load balancer stopped responding and reported it to SM 45 minutes ago. And it's still down. icon_rolleyes.gif
nacs
Load balancer dead for the last 50 minutes. Last update to the ticket from SM was 20 minutes ago saying that it's "being worked on".

I called SM support just now and while asking about the lack of updates I found out that the load balancer outage apparently does not just affect me but other customers also (not sure how many others are affected).
nacs
I usually don't find a service outage worth reporting like this but everything about this load balancing service, although featured prominently on the front page, screams of a far-from-production-ready service.

SM claims "Super Balance not only provides redundancy within your server topology, but ensures full redundancy upstream as well. With two dedicated load balancers forwarding traffic to your server cluster you'll never have to worry about a single point of failure within the Super Balance system." Apparently, this statement is not true.

It has now been over an hour since the load balancer went down and it continues to be down. This is simply unacceptable for a system whose purpose is to be redundant and provide failover service.

A service like is clearly targetted and will be used by high traffic sites for which downtime such as this is not acceptable. I would expect it to perform without failure considering all the hype about the redundancy and the fact that it costs $49 per month just for 2 nodes.

I never imagined the failover system itself would be the one that fails and causes my site to be down for over an hour.


This service is clearly not production ready, should not have been launced at this time and I would highly recommend that anyone considering purchasing to not buy "Super Balance" at this time. It's simply not stable yet.
Guspaz
Demand a refund.
nacs
Looks like the load balancer is finally up.

Ticket created to report outage: 05:37 PM (load balancer failed abut 5-10 minutes before ticket was created).
Load balancer working again: 07:47 PM

The times posted above are copied and pasted from Orbit's ticketing system. In other words, 2 hours and 10 minutes of load balancer failure/downtime.

I have no problems with the technicians themselves as they have been relatively responsive to my support-center calls. However, I'm very disappointed in SMs entire handling of this load balancing system. The Orbit frontend for the load balancer is clumsy to use, is still being developed, is lacking in technical documentation, and doesn't offer the kind of flexibility that I would expect from a dedicated load balancer. The load balancer hardware itself, is apparently not as stable or failsafe as SM claims it is as proven by this lengthy outage. I had expected better from Servermatrix's load balancing solution (especially as they have plenty of past experience in the area via ThePlanet's high end solutions) and am sorely disappointed by the performance.
nacs
Downtime again. icon_rolleyes.gif 1 hour and counting now. And once again, the 2 Apache servers that are behind the load balancer are operating fine but the load balancer provided by Servermatrix is unpingable and not responding.

Again: if you're looking into the Servermatrix load balancer to increase reliability or redundancy for your site, do NOT buy it yet. The Servermatrix load balancing service is unacceptably unstable.
dynamo
This is a concern if true.

I'd use the load balancer for a mission critical service, so downtime on a load balancer is particularly poor, especially as it wouldn't be something I could go in and fix.

It's odd, because I was thinking it would be worth paying the extra to SM to have a known good way to load balance, but you may be better off picking up another two servers and sticking them in as redundent load balancers.

Let us know what SM comes back with on this.
nacs
The last downtime I posted about above lasted 2 hours.

I also had a brief downtime less than an hour ago.

The techs who work on this load balancing service seem diligent in working out the problems but this is simply a product that is not ready for production level use yet and should not have been launched at this point by Servermatrix.

I have no doubt that in a couple of months, this load balancing system will be stable enough for use but right now it isn't and it was poor planning on Servermatrix's part to launch an unfinished, unstable product at this time.
nacs
Down yet again. 30 minutes so far with no updates.

QUOTE
Is the load balancer itself highly available?
The load balancing solution is made up of an HA cluster for high availability. If one node fails, another node will pick up the services from the failed node and continue to route traffic appropriately. All nodes in the LB cluster are monitored very carefully by the NOC and engineering teams


This is what Servermatrix calls 'high availability'? Over a dozen outages the last 30 days with literally hours of downtime in total.
adamuk
i think you must be unlucky.

these forums are on 2 load balanced boxes i belive.
nacs
If these forums are on load balancers, they certainly aren't on the balancers I am on. It's highly possible that they're using their Alteon solutions or a dedicated Zeus load balancing installation separate from their shared Zeus load balancing they offer to clients. The load balancer is also shared as I found out so when there's an outage, it's not just me that gets downtime (a tech has confirmed in the past that I'm not the only one who was affected when the load balancer downtime(s) occured).
nacs
And here we go again. Downtime yet another day. A call to SM revealed that absolutely no one who is in charge of the load balancer or anyone in the "managed services" team is there.

This is absolutely infuriating.
adamuk
QUOTE (nacs)
And here we go again. Downtime yet another day. A call to SM revealed that absolutely no one who is in charge of the load balancer or anyone in the "managed services" team is there.

This is absolutely infuriating.


keep phoning them up, explain your problem to anyone you speak to also get there names . . eventually you will speak to someone who knows what there talking aobut
tubededentifrice
it's less expensive to order 1 server you dedicate to the load balancing and you can put the number of services and the number of nodes you want.......
a lot less expensive, a lot more controles on it....

$49 for two nodes and $49 per additionnal nodes ! that's incredible...
trkhost
just started the load balancing adventure and it's now november, way past your august troubles. Problem hasn't changed much. I call the planet in the morning and the same technician answers the phone who has absolutely no clue about anything. Everything get's 'referred'.
I have had load balancing tickets in now for 5 days and nothing get's done.
I am already sorry I bought this 'solution'. Seems more like yet another 'problem' to me.

T
dynamo
Was this ever resolved?

Would be good to get the followup on this...
binary
QUOTE
Was this ever resolved?
Would be good to get the followup on this...


"Me too!"

I would also be interested to hear from someone who's using the balancer feature. How is it working for you? I'm guessing it works fine if you're actually using, otherwise you'd be off to some other solution....
binary
Well, I just started using load balancing, and I'm already having problems. I've opened a ticket in Orbit, but the balancer is flaking out.

I have three servers behind it, and they're all up. Everything was working. I made no changes. But right now, when I try to go to the site I get this message in my browser:

CODE
No suitable nodes are available to serve your request.


It's not really OK for site visitors to see this - not when all three of my nodes are up and working.

Also, I can't get to the Balancer admin page in Orbit - it never finishes loading, so I can't change configuration or anything. This has been happening for 15 mins or so. Yup, I'll call TP soon if it's not resolved via ticket.

Here's a ping test (10 sec intervals) from *within* the planet's network, showing intermittent response from the balancer:

ping -i 10 67.18.200.x
PING 67.18.200.x (67.18.200.x) 56(84) bytes of data.
64 bytes from 67.18.200.x: icmp_seq=0 ttl=62 time=0.392 ms
64 bytes from 67.18.200.x: icmp_seq=1 ttl=62 time=0.559 ms
64 bytes from 67.18.200.x: icmp_seq=8 ttl=62 time=0.171 ms
64 bytes from 67.18.200.x: icmp_seq=9 ttl=62 time=0.298 ms
64 bytes from 67.18.200.x: icmp_seq=10 ttl=62 time=0.641 ms
64 bytes from 67.18.200.x: icmp_seq=17 ttl=62 time=0.301 ms
64 bytes from 67.18.200.x: icmp_seq=18 ttl=62 time=0.683 ms
64 bytes from 67.18.200.x: icmp_seq=19 ttl=62 time=0.567 ms
64 bytes from 67.18.200.x: icmp_seq=20 ttl=62 time=0.703 ms
64 bytes from 67.18.200.x: icmp_seq=21 ttl=62 time=0.333 ms
64 bytes from 67.18.200.x: icmp_seq=22 ttl=62 time=0.890 ms
64 bytes from 67.18.200.x: icmp_seq=23 ttl=62 time=0.423 ms
64 bytes from 67.18.200.x: icmp_seq=24 ttl=62 time=0.893 ms
64 bytes from 67.18.200.x: icmp_seq=25 ttl=62 time=0.589 ms
64 bytes from 67.18.200.x: icmp_seq=26 ttl=62 time=0.602 ms
64 bytes from 67.18.200.x: icmp_seq=27 ttl=62 time=0.980 ms
64 bytes from 67.18.200.x: icmp_seq=28 ttl=62 time=0.363 ms
64 bytes from 67.18.200.x: icmp_seq=29 ttl=62 time=0.370 ms
64 bytes from 67.18.200.x: icmp_seq=30 ttl=62 time=0.299 ms
64 bytes from 67.18.200.x: icmp_seq=31 ttl=62 time=0.809 ms

--- 67.18.200.x ping statistics ---
32 packets transmitted, 20 received, 37% packet loss, time 310021ms
rtt min/avg/max/mdev = 0.171/0.543/0.980/0.227 ms, pipe 2

Uhhhhh.... not cool. Times are great - except for LOTS of missing responses.

I'll try to follow up this post, but if I don't, it means I'm not using load balancing from TP anymore.

icon_cry.gif
Damon85
Ditto.

Total Balance has been flaking out all day, most notably tonight. I'm getting such delightful messages as "No node available to serve request" to "Document contains no data" (from Firefox). I just totally balanced our main website by putting in multiple A records for the webservers. Some service for $98...
binary
Damon85, good evening to you! icon_biggrin.gif

QUOTE
I just totally balanced our main website by putting in multiple A records for the webservers. Some service for $98...


Sorry to hear you're having the same problem(s). Hehe, I just made a DNS change for multiple A recs in the time it took you to post your reply. We'll see how it goes.
Damon85
Just got off the phone to technical support to find out that no one was assigned to the issue and that someone should be soon (we shall see).
gordonrp
QUOTE (Damon85)
Ditto.

Total Balance has been flaking out all day, most notably tonight.  I'm getting such delightful messages as "No node available to serve request" to "Document contains no data" (from Firefox).  I just totally balanced our main website by putting in multiple A records for the webservers.  Some service for $98...


Ah very interesting! We had the same issue last night and again tonight. We find that our nodes load the site quickly but the load balancer hangs / doesnt refer to the nodes and health reports them as offline.

Also tonight we were not able to load the load balancer sub page (where it shows the nodes, and their health). nothing but problems.
Damon85
The load balancer status page was loading earlier, but then again it wasn't other times. Seems sort of hit and miss... a lot like the balanced sites.
Damon85
If anyone is still checking this...

Three phone calls and several ticket updates later, a load balancing engineer/technician has modified a buffer setting of some sort on the load balancer to correct the issue. I haven't encountered it again as of yet, but it took three weeks to show up this time, so who knows.

I'm glad they've fixed it, but the 17.5 hours and 3 phone calls it took to happen leave a bit to be desired. Apparently the L.B. techs/engineers are on-call, even on Tuesday at 1:15 in the afternoon (Planet's time).
gordonrp
QUOTE (Damon85)
If anyone is still checking this...

Three phone calls and several ticket updates later, a load balancing engineer/technician has modified a buffer setting of some sort on the load balancer to correct the issue. I haven't encountered it again as of yet, but it took three weeks to show up this time, so who knows.

I'm glad they've fixed it, but the 17.5 hours and 3 phone calls it took to happen leave a bit to be desired. Apparently the L.B. techs/engineers are on-call, even on Tuesday at 1:15 in the afternoon (Planet's time).


Have noticed improvement, will be keeping a close eye.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2010 Invision Power Services, Inc.