Help - Search - Members - Calendar
Full Version: Setting Up File Replication With Unison
The Planet Forums > System Administration > Load Balancing
Axel_2004
Ok..

Great that we have a forum for this now.. icon_biggrin.gif

What I am trying to do is set up a complete file replication system for 2 boxes that on in a HAC (high availability cluster). I've looked around and think that Unison is probably the best way to do the bi-directional file replication.

Does anyone have any scripts and or setup info on using this program for 2 way syncronization?

I'll be playing around with it as soon as I am done updating both machines..

Advice and tips would be helpful as I think we're all kinda new to this type of cluster..


I've got the firewalls up on both boxes now and have cpanel accounts sync'd on both systems. Unison is realy quite simple to set up but I am having a problem getting it to work via ssh through the pnet.. (cuz someone forgot to plug in a nic card!)
icon_sad.gif


Axel..
Klaas
this may not be the 100 fool proof Howto, but i guess
anyone got themselves a cluster isn't a newby

setting up SSH, single line command

run this command on both clusters, cluster1 and cluster2
hit enter twice then
here you need to enter the root password (last time you do)

on cluster1
ssh-keygen -t dsa -f ~/.ssh/identity && cat ~/.ssh/identity.pub | ssh cluster2 'sh -c "cat - >>~/.ssh/authorized_keys2 && chmod 600 ~/.ssh/authorized_keys2"'

on cluster2
ssh-keygen -t dsa -f ~/.ssh/identity && cat ~/.ssh/identity.pub | ssh cluster1 'sh -c "cat - >>~/.ssh/authorized_keys2 && chmod 600 ~/.ssh/authorized_keys2"'


done sofar, onto unison:
cd /bin
wget http://www.cis.upenn.edu/~bcpierce/unison/....linux-gtkui.gz
gunzip unison.linux-gtkui.gz
chmod 777 unison.linux-gtkui
mv unison.linux-gtkui unison

install at both servers

this example command is run at host cluster1
and sync both /var/www/html directories
unison /var/www/html ssh://cluster2//var/www/html -batch

first run takes very long to build datebases etc
on my cluster for example it took over a day for 300.000 files


edit directories to skip etc
/root/.unison/default.prf

if you are running APF
edit /etc/apf/conf.apf and change
TIF="eth1" (or whatever your PNET nic is)


there's much more but this basic thing should get you started

klaas
Axel_2004
Thanks for the tips on setting up ssh so it doesn't ask for a password... I was wondering how to do that..

I already have unison set up and working on both boxes, realy basic program to use. I think I'll get a bit more detailed as soon as EV1 plugs in the nic card that they forgot..

I am however a bit of a newbe when it comes to linux, I spent many years in dos and in a win-X environment, only 2 years in linux so far..

Axel..icon_biggrin.gif
alex.davies
If anyone is confused why they are getting errors like this:
CODE
unison: error while loading shared libraries: libgtk-1.2.so.0: cannot open share
Then replace this
CODE
cd /bin

wget [url]http://www.cis.upenn.edu/~bcpierce/....linux-gtkui.gz[/url]

gunzip unison.linux-gtkui.gz

chmod 777 unison.linux-gtkui

mv unison.linux-gtkui unison
with this
CODE
wget [url]http://www.cis.upenn.edu/~bcpierce/unison/download/stable/latest/unison.linux-static-textui[/url]

mv unison.linux-static-textui unison

chmod +x unison
This is the text only version of the binary (which is all you are ever going to use on a server) and does not require the graphical libraries.

If you want a slightly more in-depth HOWTO on creating the SSH connection between two servers you want to go here.

Alex
james_madrid
How long will it take the replication for database. For example i have 400,000 records how long will it take to finish the replication????
alex.davies
Do not use unison to replicate a database!

Is your DB MySQL? Set up replication. Replication is fast as long as your nodes are not overloaded, but it is NOT instant so if you have lots of inserts into a table with a primary key you may hit big problems.

With best wishes,

Alex
webmaster4india
Hi Guys

Does unison or rsync support incremental replication? Will it be able to read files which are in use? Like database files?

Say suppose we sync the file in server a to server b? Will the file be deleted in server b if the file is deleted in server a?

How frequently we can run this? I am planning to implement clustering using this and load balancing provided by EV1. Will this work out? Do you think its suited for shared hosting and shared mail services?

Thanks
Regards,
Webmaster
eike
QUOTE (webmaster4india)
Does unison or rsync support incremental replication? Will it be able to read files which are in use? Like database files?


Yes. Note that unison and rsync SOLVE DIFFERENT PROBLEMS. Unison lets you modify the data on the replicated side and will try to merge changes made on either side. rsync only replicates one-way -- that is, changes from the receiving side will never be synced to the sending side.

But yeah, both offer incremental replication icon_smile.gif

QUOTE
Say suppose we sync the file in server a to server b? Will the file be deleted in server b if the file is deleted in server a?


If you want it to, sure. Both tools can be made to do this.

QUOTE
How frequently we can run this? I am planning to implement clustering using this and load balancing provided by EV1. Will this work out? Do you think its suited for shared hosting and shared mail services?


No, this will not work out, at least not the way you think. You can't "replicate" a database this way. Sure, you may copy files, but they have no guarantee of atomicity nor a working merge. You will be left with inconsistent databases on both sides using unison and on the receiving side using rsync. Most decent databases have replication- or clustering-mechanisms of their own for precisely this purpose and reason.

In theory, if your mail system is set up right, this could be used for mailspools, sure. Assuming a maildir-storage-method, there isn't much of a chance for collisions in the filenamespace, so merges of two different emails will not happen.

Shared Hosting might work in simple cases, too. With dynamic programs requiring access to filesystem, you can easily run into non-reconcilable changes if your programmer does not know he is going to be replicated. You can also easily loose consistency (consider the simple example of a simple file-based counter script. Host A and Host B both receive hits that are counted. Both read and write to the file counter.txt whenever the webpage is hit. When you run unison, you will have conflicting changes to that file; when you run rsync from A->B, all counted on B will be lost and overwritten with data from A.

As for frequency : as often as you like, but it will incurr a sizable processing power and storage subsystem hit. A full rsync of a reasonably busy webhosting machine to another will take minutes or even an hour, EVEN if there were no actual changes (this is due to the fact that rsync would have to look at every file and whether it has changed ... This is a lot of work). Remaining services on your server will slow down considerably during this period.

If you want two hosts to access the exact same files, you'll need a distributed filesystem (like Coda), or a fileserver exporting its filesystems via a networked filesystem (like NFS) (and even in this case, you can't expect two instances of a database to share the same files without it blowing up in your face).

Good luck with your endeavor icon_smile.gif
sen
Unison is a great utility for inter-server file synchronisation depending upon the purpose. A common example (and our first implementation of it) was for synchronising Maildir files between mail servers (40GB of frequently changing data, sync over ssh at 5 minute intervals between countries).

Works well with Courier maildrop / courierIMAP&POP services. One small problem (more annoyance rather than critical failure) we ran in to was the courier quota files would sometimes become corrupt (maybe there was an option for unison to disable the file-incremental-sync that I missed. Regardless, in the end we used a different approach for quotas).

You can't do basic file-synchronisation with databases (at least not with the ones I've used). MySQL has some great 'clustering' / 'replication' features you should look in to. We've used MySQL 5's Replication (in two-directions) with great success for a while now on some fairly busy databases between different countries - but application of this depends upon the way you use your database.
funksta
When going through the above I did everything that was written but what I didn't understand was:

CODE
unison /var/www/html ssh://cluster2//var/www/html -batch


There is nothing in /var/www/html directory. I assuming that on some op systems, or in single website servers the html files are stored there.

When I ran this I ran:

CODE
unison /home ssh://cluster2//home -batch


which is where all our clients websites route to. Is this the correct thing to do? Are there any implications with doing this?

Is this the best solution? We have a High Availabilty Cluster Load Balanced Pair which sits behind a public switch i.e. not private rack. Both servers are set up the same with DNS Clustering set up in cPanel and there are approx 140 websites at the moment. What we are wanting is for both servers to be synchronised for mail and http. We are going to look at clustering for mysql.

Any help would be great
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.