Help - Search - Members - Calendar
Full Version: How to Keep Data Sync’d Between Two Load Balanced Servers
The Planet Forums > System Administration > Load Balancing
samf
How to Keep Data Sync’d Between Two Load Balanced Servers

Load balancing helps provide redundancy to your website. However, one often asked question is how to keep your content synchronized on each server. If you put a new web page on one server, how does it get copied over to the second server? You can simply make sure that when you upload or change a document on one server, you do the exact same thing on the other server(s). Or, you can use tools like rsync or robocopy to help keep the files on your servers in sync with one another.

The rsync utility is available for most *nix Operating Systems (RedHat, FreeBSD, Debian, etc.) and is relatively simple to configure and run. The general concept is that you would need to install rsync on your "master" server. Then, when you update files on your master, you can run an rsync command to copy all "new" files over to the secondary servers. You can even do more advanced things such as setting it up to run automatically via cron or setting it up as a daemon.

If your servers run Microsoft Windows instead of *nix, Microsoft provides a utility in their resource kit. This utility - robocopy - can be used in a way similar to rsync. Once you put new files on a server, you can use robocopy to copy those files over to the secondary server(s).

Either of these tools can be setup to run automatically via cron or AT jobs. Keep in mind though that if the process is automated and you make a mistake on your "primary" server, that mistake will get copied over to your secondary server(s). For that reason, it is advised that if you have an automated process, you use a development environment to test all of your changes before putting them into production.

This document shows very minimalist ways of using each tool. There are many variations on the suggested configurations and you are encouraged to learn more about each step. Links to more information are provided so that you may explore other, more advanced settings and options. If you would like assistance setting any of this up on your server(s), please contact our technical support team.



Microsoft Windows Operating System:

1) Get the Resource Kit. To obtaine robocopy.exe, you will need to download the Windows Server 2003 Resource Kit Tools from http://www.microsoft.com/downloads/details...&displaylang=en.

2) Install the Resource Kit. Once you download that tool to your server, run the file to install of the tools. By default, the tools will be installed in "c:Program FilesWindows Resource KitsToolsrobocopy.exe".

3) Determine which directories/files on your servers need to be kept in sync. For instance, if all of your web files are in c:Inetpubwwwrootmydirectory, those are the only files you should worry about copying from server to server.

4) On each of the secondary servers, setup Windows to share the directory back to your primary server. You can do this by right-clicking on the directory name and selecting the "Properties" option. On the pop-up box that appears, select the "Sharing" tab. On that tab, you can select "Share this folder" and put in the proper folder name. (One way to increase the security of sharing folders is to place a trailing dollar sign (“$”) on the folder. This “hides” the shared folder so it’s not browsable.)

5) You should now be able to run robocopy.exe. You can do this by from a command prompt. Click on Start -> All Programs -> Accessories -> Command Prompt. Now from the command prompt, you can run robocopy with the following syntax:

robocopy source destination

For instance:

CODE
c:program fileswindows resource kitstoolsrobocopy.exe c:inetpubwwwrootmydirectory 123.123.123.123inetpubwwwrootmydirectory


In this case, "123.123.123.123" is the IP address of secondary server. If you have multiple servers, you would run that command once for each server.

To make this process a bit simpler, you can create a batch file that contains the commands necessary to copy all of your files. To do so, you can copy, paste the following content into a text file named "myrobocopy.bat" on your "master server". Then, be sure to edit the relevant parts of that file to include your IP address(es) and directory name(s).

CODE
c:program fileswindows resource kitstoolsrobocopy.exe c:inetpubwwwrootYOUR_DIRECTORY_NAME_HERE YOUR_IP_ADDRESS_HEREinetpubwwwrootYOUR_DIRECTORY_NAME_HERE


(If you have multiple "secondary servers", make a copy of the above line for each one. Then be sure to change the IP address as necessary. And don't forget to setup the Windows share for each server.)

Once you have this batch file created, you can run it from the command line without always having to remember the proper syntax, IP address(es) and directories.

For more detailed information, refer to robocopy.doc included with the resource kit



RedHat Operating System:

1) Get and install the rsync utility. The easiest way to do this is by using the up2date utility. This will download and install it for you:

CODE
up2date rsync



2) Determine which directories/files on your servers need to be kept in sync. For instance, if all of your web files are in /var/html/mydirectory, those are the only files you should worry about copying from server to server.


3) You should now be able to run rsyc from a command line prompt using the following syntax:

rsync [options] source destination

For instance:

CODE
rsync --links --relative --rsh=/usr/bin/ssh /var/html/mydirectory 123.123.123.123:/var/html/mydirectory


In this case, "123.123.123.123" is the IP address of secondary server. If you have multiple servers, you would run that command once for each server.

To make this process a bit simpler, you can create a batch file that contains the commands necessary to copy all of your files. To do so, you can copy, paste the following content into a text file named "myrsync.sh" on your "master server". Then, be sure to edit the relavent parts of that file to include your IP address(es) and directory name(s).


CODE
#!/bin/sh



rsync --links --relative --rsh=/usr/bin/ssh /var/html/YOUR_DIRECTORY_NAME_HERE Y

OUR_IP_ADDRESS_HERE:/var/html/YOUR_DIRECTORY_NAME_HERE


(If you have multiple "secondary servers", make a copy of the rsync line for each one. Then be sure to change the IP address as necessary.)

Once you have this shell script created, you can run it from the command line without always having to remember the proper syntax, IP address(es) and directories.

Refer to http://samba.anu.edu.au/rsync/ for more detailed information on using rsync,

Each time you run the rsync command, you will be prompted for your password. If you intend on automating the procedure, you’ll need to generate and distribute ssh keys amongst your servers. One example of how to do this can be found at:

http://www.brandonhutchinson.com/Passwordl...ssh_logins.html
Guspaz
One of the big benefits of RSync is that it doesn't simply send over new files, it updates files by sending only the parts that have changed similar to a DIFF.

However, RSync is extremely slow for large numbers of small files; if you're trying to sync several thousand small files in a web site, it will take too long to be worth it. In that case setting up something to tar (or tar and gz) the stuff (changed/new files) and send it over might be more efficient.

You could probably write a shell script to do this on the sending end quite easily; it'd just have to get a list of all files modified (created would be included in that) since the last run, tar them up (Optionally do a -z in tar to gzip them), then copy it over to the other server, perhaps via NFS or SMB.

Then the receiving server could have a simple shell script that checks for a new received file and untars it into the destination directory. This receiving script could be run by cron once a minute; it would use few resources simply to check if a new file has been copied over.
Syan
I tried this code, it doenst sync the folders;


skipping directory /home/virtual/site1/fst/var/www/html/


unsure.gif
BlueFusion
I use:
CODE
rsync -ave ssh /home/user root@hostname:/home


Or something like that. Just setup the RSA keys for SSH and you're golden for cronjobs.

By the way, I found that using Cygwin, you can use rsync on Windows, too. I just set this up in the last few weeks for Windows -> Linux rsync backup.
rabbit994
For Windows Sync, I would look into SyncBack (SyncbackSE). It's work for me replicating local copies of Visual Studio Projects on two computers up to my server.
JustinK101
A few snags I have questions about?

We are runnnig Windows 2003 with IIS, and thinking of setting up two servers in Load Balancing.

How in the world do you manage sessions, and application pools across multiple servers? How do you sync up the sessions?

Also, how do you deal with email servers, which store messages and log files? Finally databases? I do know that most databases support the ability to mirror built in, but can this be done on a file level?

Headache..
opensourcedevelopment
Hi,
The best solution is dns round robin and database in SAN and Windows cluster.
You can create catalog in active directory and divide users in two servers.
Regards
Opensourcedevelopment.net
Software Development, Support, Server Maintenance
Aaron Moon
I have done this on several servers located everywhere, Europe, US, ASIA ect...

I basically did the above and then installed cron jobs to tell it when to sync and how often, keep in mind rsync can be used for more than just that. for true clustering i actually used it for syncing users and password files so that one cpanel server can make new accounts and the other cpanel server will install the users and groups.

I also installed keys so that i will not be prompted for passwords, my crons are set several minutes apart as once the data is MANUALLY and i mean MANUALLY synced then the rsync doesn't really add to the load of my boxes at all.... It checks often enough that any changes that are made are small.

I must stress this if you upload massive files on a regular basis then you will have to stop the system and manually sync the file.... here is an example..... i have a 2 gig file i upload.... the remote server sees it and tries to copy it.... if my cron is set too close then it will try and run again even though it's not yet finished.... i only see this as a problem in large files that take a long time to upload.. it can cause some load issues until the files are properly transfered, as the file is constantly changing and rsync is trying to compare it.

in this setup i also enabled mysql clustering, which suitrs this setup almost perfectly.

Linux cpanel servers clustered.. took a while to debug but it works well...
S. Ural
I had created accounts (domains) on one server not on the other server. Will rsynch synch all files (accounts, their folders etc) or is it restricted to web files (html, php etc) only?
QUOTE (samf @ Sep 29 2004, 09:42 PM) *
<span style='font-size:25pt;line-height:100%'>How to Keep Data Sync’d Between Two Load Balanced Servers</span>
Load balancing helps provide redundancy to your website. However, one often asked question is how to keep your content synchronized on each server. If you put a new web page on one server, how does it get copied over to the second server? You can simply make sure that when you upload or change a document on one server, you do the exact same thing on the other server...
ChuFuong
QUOTE (opensourcedevelopment @ Jun 12 2007, 10:54 PM) *
Hi,
The best solution is dns round robin and database in SAN and Windows cluster.
You can create catalog in active directory and divide users in two servers.
Regards
Opensourcedevelopment.net
Software Development, Support, Server Maintenance


I think I like this idea... what about for Linux though?
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.