About Us  |  Blog  |  Hosting Partners  |  Legal  |  Portal Login

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Your Disaster recovery plan, and what you learned from H1
hkaye
post Jun 4 2008, 12:01 AM
Post #1


Celery
*

Group: Members
Posts: 25
Joined: 3-March 06
Member No.: 19,845



First let me say one thing. I think The Planet (or formally know as Rackshack/Ev1 to me) are doing a great job. The explosion at H1 is a disaster and what they are doing is incredible to get the data center up and running and I thank them for it.

But, let's talk about disaster recovery. I know this thread probably belongs in another forum but since this forum is active about the H1 disaster I hope it stays here awhile.

My server is on floor 1 of H1. IP 207.44.180.108. I was tinkering with some files on Saturday so I was aware the server went down right away. I waited 10 minutes. Pinged the server. Then went to just-ping.com to ping it from other locations. Yup it was down. Trace routed. Did some other stuff, realized within 30 minutes it was a wide outage in the data center - not my server.

I went to the forums. Started following the outage. When I read this -

"On Saturday, May 31st at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room. Thankfully, no one was injured. In addition, no customer servers were damaged or lost."

I went into emergency mode. I ordered a virtual dedicated server from GoDaddy. I ordered it from GoDaddy not because I didn't trust The Planet but I just assumed everyone else would be doing the same thing and I didn't want to wait for a server because everyone else was ordering one too from The Planet.

Next morning I had a virtual server. Had the IP address of the virtual server. Went to the DNS control panel of every domain I host and changed it to the virtual server. I figured if H1 is up, visitors will see the site at H1, otherwise if it was not they would see it at the virtual server. Spent the next 4 hours uploading the offsite backups to the new virtual server.

Okay some of the sites didn't work right because of the lack of a proper database, but they were live and on the net. Some I put a make shift frameset on the site explaining the site is in emergency back up server mode, please stand by. All in all within 27 hours my store and another customers store was up and running. Every site was functional except for one which remained in static page mode.

On Monday when my server came online, first thing I did was download a backup of everything. Then I put the databases on the virtual server and everybody was up and running. It was dog slow for some sites, but they were live. Crisis averted. I ordered a new server from The Planet, and asked for it to be put in a Dallas data center. Went to bed.

Woke up Tuesday morning. H1 is down again. Boy did I feel good about leaving my site and my customers sites on the dog slow virtual backup server. Spent the day configuring and setting up the new server in Dallas. It's done. I haven't switched the IPs in the DNS to the new fast server yet. My eyes are tired, I'll wait until tommorow morning and double check before switching everyone over.

Whew!

------------------------------------------------------------------------

So let's talk about what we learned. How we can make a fail safe backup plan - just in case.

Right now I have two servers. A virtual one with GoDaddy and a main one with The Planet in the dallas center. I think I'm going to cancel the H1 data center server, just because I think I would do The Planet a service - one less server to worry about in that data center.
Go to the top of the page
 
+Quote Post
Bob de Bilder
post Jun 4 2008, 06:19 AM
Post #2


Newbie


Group: Members
Posts: 11
Joined: 3-December 04
Member No.: 15,068



my future plan is to set-up a 3rd server, I have a 2nd in the UK, and to mirror all my important sites on this server

so that if ever this happens again I can simply forward the domain names to this backup

I have to give it further thought to be effective however this is probably the best solution I can think of currently

I had been planning a 3rd server to host my Adservers, so there is no reason why I can't also use it as an emergency backup


--------------------
Learning is a curve
Go to the top of the page
 
+Quote Post
santosh
post Jun 4 2008, 11:49 AM
Post #3


Enlightened
*

Group: Members
Posts: 97
Joined: 20-August 01
Member No.: 111



QUOTE (hkaye @ Jun 4 2008, 06:01 AM) *
Next morning I had a virtual server. Had the IP address of the virtual server. Went to the DNS control panel of every domain I host and changed it to the virtual server. I figured if H1 is up, visitors will see the site at H1, otherwise if it was not they would see it at the virtual server. Spent the next 4 hours uploading the offsite backups to the new virtual server.


How many sites did you have on the server that went down? One of the problems that I've faced doing such moves is that control panels like Plesk and Cpanel don't do a good job of easing the migration (sometimes even migrating sites from one version of Plesk to another one which is just a minor upgrade causes problems even if we use their inbuilt migration programs). The alternative is to do the migration manually - configuring ftp, email, databases etc - but that takes a lot of time if you have a lot of sites.

Another alternative is to have a permanent standby server (maybe a cheap Celeron or maybe even 2 cheap Celerons located in different data centers) where you add sites and their email accounts whenever you add them on the main server. You can inform customers about this standby server and give them the logins for their email accounts on the standby server too.

That way if anything goes wrong you just change the IP to the standby server and it would show the placeholder page and allow customers to access their email using the default password you provided at signup time. I think I will actually go this route since most of my clients were more angry they couldn't access their email than the fact that their sites were down.

Also this incident has shown that putting domains into TP's DNS is also not a good idea. Many domain registrars like Network Solutions and Godaddy now offer DNS services free of charge when you register a domain with them. So it makes sense to use that instead of the hosting company's DNS server. I'm guessing Netsol and Godaddy have a lot more at stake because if their DNS servers go down then most of the Internet would grind to a halt so its a pretty safe bet to use their DNS server. In any case there are also other secondary DNS providers like DynDNS that one could use.

And the final backup plan must be to pray to God every day. ;-)
Go to the top of the page
 
+Quote Post
Martyn Dale
post Jun 4 2008, 12:43 PM
Post #4


Master
***

Group: Members
Posts: 346
Joined: 28-March 06
From: England
Member No.: 20,467



We have 100 servers with TP (well, 124 at this point) and 25 of them went down due to the outage.

Our DR plan involved getting new box's up ASAP in another DC (dallas) and moving backups over to those until such time H1 came back online. At that point we then had scripts ready to merge the crossover data, ie the backups wont have been fully up to date obviously

In addition our DR plan seems to involve the operations manager (me) having to be awake for over 50 hours, before only getting a broken 8 hours of sleep before another 24 hour stint. Unfortunately at the time of the kaboom, id already been up for 12 hours, so thats what the real kicker was. Had it happened in my morning, id have been in much better shape hah. But i think thats the only factor of the DR that wasnt ideal/

Its of course pretty unrealistic to have all 100 mirrored in 2 locations, and the same is true for most customers, but i think considering what happened, we made the best of it, we were fully running from dallas a couple of hours after each machine came back up, and as a result of that, when phase 1 died again, we had no more downtime as we had already grabbed the data.

A lot of work, an extra months of server costs to have the extra box's set up to at least get a "we havnt left, theres just an issue" along with links to details of the incident.

Redundancy doesnt start and stop at the DC, its also down to the customer to get setups working elsewhere. We did that, our customers are happy with how it was handled. To those that wasted your time on these forums stating how unhappy you are, you had plenty of time to work on getting data up elsewhere. Seems im not the only one in this thread who managed to do it. IMO sitting back, blaming the DC, and not getting on with recovery is your issue.
The one exception to this of course is the user whos backup server was placed in the wrong location, so also couldnt access that. If you are reading this, you arent included in the above icon_smile.gif


--------------------
Creator of the first underground, overground, flying, stationary, powerless, powered Data Center
Go to the top of the page
 
+Quote Post
markcausa
post Jun 4 2008, 01:36 PM
Post #5


SuperGeek
****

Group: Members
Posts: 3,019
Joined: 8-July 06
From: Los Angeles, CA
Member No.: 22,425



This week PhireFast is deploying two new DNS servers, one for our ns1, one for ns2. Ns1's dns server will be housed in Dallas while ns2 in Houston.

We have already began the process of splitting 50% of our customers to Dallas servers for extra redundancy.

The server which stores our hourly backups off all accounts will be in a separate Dallas DC as well so we're never without data again.

Simple, right? smile.gif


--------------------
Mark A. Mutti - PhireFast: Our Support & Prices are HOT!
W: www.phirefast.com
P: (866) 350-4456 Ext. 100
E: Mark.mutti@phirefast.com

(I still lurk around here every now and then)
Go to the top of the page
 
+Quote Post
hkaye
post Jun 4 2008, 01:39 PM
Post #6


Celery
*

Group: Members
Posts: 25
Joined: 3-March 06
Member No.: 19,845



I only have about 20 sites. So doing it by hand (which is the way I did it) does take a bit of time.

About two years ago I moved all email to GoDaddy. Frankly I was tired of battling spam and trying to get our server to send email to some domains such as hotmail.com without it going in the spam filter there. Also, I was tired of doing desk top support for my clients on how to connect to the email server for the nth time. GoDaddy has 24/7 phone support, they can handle the email headache - it's a loss leader for me.

So I only host the website stuff. My wife maintains and designs the sites. It was tag team on Sunday, everytime I set up a new domain she clicked upload on her computer. Then I went fixing the sites that needed a database to work.

Over the last two years we transfered all the domains to GoDaddy. So it's under one control panel and easy to access.

Also, two years ago I had issues with EV1 name servers so I moved all the name servers to GoDaddy at that time too. Only two old sites remained at EV1- I could switch the names servers to GoDaddy.

Maybe putting email, nameservers and domain control all at one vendor is a bad idea, but I personal think that GoDaddy is really good at that stuff, just like I believe ThePlanet is really good at providing servers and internet connectivity. So hopefully I picked the best of the best for the job that needs to be done.

-----------------------------------------------

Now I've got two servers set up. I'm putting together cron jobs to archive and scp backups of both the HTML content and database content from the primary to the backup. All the domains I host I can access each server by the subdomain a and b -

a.insertdomainnamehere.com - primary server
b.insertdomainnamehere.com - secondary server

I can do my fire drill test easily by typing in the domain to test if the backup server content is up and valid.

------------------------------------------------

Also, I'm going to make cron jobs on the backup server to test the actual sites on the primary to make sure they are live and email or text my phone when it detects the sites are down.
Go to the top of the page
 
+Quote Post
Andrew Pollack
post Jun 10 2008, 05:46 AM
Post #7


Celery
*

Group: Members
Posts: 42
Joined: 30-November 07
Member No.: 49,813



A lot more has to be considered than just the server side when you're talking failover. I'd waited too long to get my hot failover machine up and ready, but I did get it done last week.

For Second Signal, what I have is a service offering that is built using Asterisk and a lot of other program code. My environment is a bit more complex than most. There are two kinds of data storage. Much of the workflow data and content is stored in IBM Lotus Domino databases, so that's easy to handle. Domino has built in clustering so data updates are in sync to within seconds of each other.

The audio data, along with the system configurations and scripts is kept in sync using Unision for the time being. That keeps it up to date within no more than 60 minutes -- which for failover is acceptable. DNS failover is also no problem in this way.

For DNS, I run my own name servers. At the TLD, three name servers are listed. Two are machines I manage in different data centers at The Planet. The third is a backup DNS provider. In reality, only one of the two I manage is configured as a Master, while the other two are configured as slaves. Updating one immediately updates the others. The configuration files for the domains are kept in sync to both my servers so that if in fact the primary goes down I can very rapidly make the other a master. The off-site slave is configured to consider both of the others masters. The off site provider isn't perfect, but its free.

That takes care of the servers, but then there's the "client side" software. For most of you, that just means browsers. Round Robin DNS or low TTL is enough to handle the browser side. For me, there is custom client side software as well. For now, the low ttl DNS is sufficient but I'm working on an updated release of the software which will be configuration level aware of the standby server so that if the primary doesn't respond for "n" minutes, the secondary will be tried.

PBX inbound telephone number routing is handled by my DID provider. I have now configured it to hit the primary server in all cases unless the primary doesn't answer. In that case it goes to the secondary. I do have another PBX DID provider that can't do that kind of routing. I have only one telephone number from that provider and I keep them around in case the main provider I use fails for some reason. If that happens, having a relationship established with the second provider will allow me to very quickly add replacement numbers wherever I need them until the issue is resolved.
Go to the top of the page
 
+Quote Post
Bib
post Jun 10 2008, 04:26 PM
Post #8


Fellow
**

Group: Members
Posts: 106
Joined: 15-November 02
Member No.: 4,889



My plan is:
two servers, mirrored one main is in the UK where most of my traffic goes and TP for backup. two dns nameservers site (mydomain and TP) all domains DNS easily managable via mydomain.com or Ukreg.co.uk. IP based fail over email accounts for most demanding clients.

mmh I think I covered it all..
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

Lo-Fi Version Time is now: 20th November 2009 - 09:44 PM
 

Dedicated Servers

Managed Hosting

Colocation

Business Solutions

Why The Planet?

Contact Us