rsync is a protocol built for Unix-like systems that provides unbelievable versatility for backing up and synchronizing data. It can be used locally to back up files to different directories or can be configured to sync across the Internet to other hosts.
It can be used on Windows systems but is only available through various ports (such as Cygwin), so in this how-to we will be talking about setting it up on Linux. First, we need to install/update the rsync client. On Red Hat distributions, the command is “yum install rsync” and on Debian it is “sudo apt-get install rsync.”
The command on Red Hat/CentOS, after logging in as root (note that some recent distributions of Red Hat support the sudo method).
The command on Debian/Ubuntu.
Using rsync for local backups
In the first part of this tutorial, we will back up the files from Directory1 to Directory2. Both of these directories are on the same hard drive, but this would work exactly the same if the directories existed on two different drives. There are several different ways we can approach this, depending on what kind of backups you want to configure. For most purposes, the following line of code will suffice:
$ rsync -av --delete /Directory1/ /Directory2/
The code above will synchronize the contents of Directory1 to Directory2, and leave no differences between the two. If rsync finds that Directory2 has a file that Directory1 does not, it will delete it. If rsync finds a file that has been changed, created, or deleted in Directory1, it will reflect those same changes to Directory2.
There are a lot of different switches that you can use for rsync to personalize it to your specific needs. Here is what the aforementioned code tells rsync to do with the backups:
1. -a = recursive (recurse into directories), links (copy symlinks as symlinks), perms (preserve permissions), times (preserve modification times), group (preserve group), owner (preserve owner), preserve device files, and preserve special files.
2. -v = verbose. The reason I think verbose is important is so you can see exactly what rsync is backing up. Think about this: What if your hard drive is going bad, and starts deleting files without your knowledge, then you run your rsync script and it pushes those changes to your backups, thereby deleting all instances of a file that you did not want to get rid of?
3. –delete = This tells rsync to delete any files that are in Directory2 that aren’t in Directory1. If you choose to use this option, I recommend also using the verbose options, for reasons mentioned above.
Using the script above, here’s the output generated by using rsync to backup Directory1 to Directory2. Note that without the verbose switch, you wouldn’t receive such detailed information.
The screenshot above tells us that File1.txt and File2.jpg were detected as either being new or otherwise changed from the copies existent in Directory2, and so they were backed up. Noob tip: Notice the trailing slashes at the end of the directories in my rsync command – those are necessary, be sure to remember them.
We will go over a few more handy switches towards the end of this tutorial, but just remember that to see a full listing you can type “man rsync” and view a complete list of switches to use.
That about covers it as far as local backups are concerned. As you can tell, rsync is very easy to use. It gets slightly more complex when using it to sync data with an external host over the Internet, but we will show you a simple, fast, and secure way to do that.
Using rsync for external backups
rsync can be configured in several different ways for external backups, but we will go over the most practical (also the easiest and most secure) method of tunneling rsync through SSH. Most servers and even many clients already have SSH, and it can be used for your rsync backups. We will show you the process to get one Linux machine to backup to another on a local network. The process would be the exact same if one host were out on the internet somewhere, just note that port 22 (or whatever port you have SSH configured on), would need to be forwarded on any network equipment on the server’s side of things.
On the server (the computer that will be receiving the backups), make sure SSH and rsync are installed.
# yum -y install ssh rsync
# sudo apt-get install ssh rsync
Other than installing SSH and rsync on the server, all that really needs to be done is to setup the repositories on the server where you would like the files backed up, and make sure that SSH is locked down. Make sure the user you plan on using has a complex password, and it may also be a good idea to switch the port that SSH listens on (default is 22).
We will run the same command that we did for using rsync on a local computer, but include the necessary additions for tunneling rsync through SSH to a server on my local network. For user “geek” connecting to “192.168.235.137” and using the same switches as above (-av –delete) we will run the following:
$ rsync -av –delete -e ssh /Directory1/ [email protected]:/Directory2/
If you have SSH listening on some port other than 22, you would need to specify the port number, such as in this example where I use port 12345:
$ rsync -av –delete -e 'ssh -p 12345' /Directory1/ [email protected]:/Directory2/
As you can see from the screenshot above, the output given when backing up across the network is pretty much the same as when backing up locally, the only thing that changes is the command you use. Notice also that it prompted for a password. This is to authenticate with SSH. You can set up RSA keys to skip this process, which will also simplify automating rsync.
Automating rsync backups
Cron can be used on Linux to automate the execution of commands, such as rsync. Using Cron, we can have our Linux system run nightly backups, or however often you would like them to run.
To edit the cron table file for the user you are logged in as, run:
$ crontab -e
You will need to be familiar with vi in order to edit this file. Type “I” for insert, and then begin editing the cron table file.
Cron uses the following syntax: minute of the hour, hour of the day, day of the month, month of the year, day of the week, command.
It can be a little confusing at first, so let me give you an example. The following command will run the rsync command every night at 10 PM:
0 22 * * * rsync -av --delete /Directory1/ /Directory2/
The first “0” specifies the minute of the hour, and “22” specifies 10 PM. Since we want this command to run daily, we will leave the rest of the fields with asterisks and then paste the rsync command.
After you are done configuring Cron, press escape, and then type “:wq” (without the quotes) and press enter. This will save your changes in vi.
Cron can get a lot more in-depth than this, but to go on about it would be beyond the scope of this tutorial. Most people will just want a simple weekly or daily backup, and what we have shown you can easily accomplish that. For more info about Cron, please see the man pages.
Other useful features
Another useful thing you can do is put your backups into a zip file. You will need to specify where you would like the zip file to be placed, and then rsync that directory to your backup directory. For example:
$ zip /ZippedFiles/archive.zip /Directory1/ && rsync -av --delete /ZippedFiles/ /Directory2/
The command above takes the files from Directory1, puts them in /ZippedFiles/archive.zip and then rsyncs that directory to Directory2. Initially, you may think this method would prove inefficient for large backups, considering the zip file will change every time the slightest alteration is made to a file. However, rsync only transfers the changed data, so if your zip file is 10 GB, and then you add a text file to Directory1, rsync will know that is all you added (even though it’s in a zip) and transfer only the few kilobytes of changed data.
There are a couple of different ways you can encrypt your rsync backups. The easiest method is to install encryption on the hard drive itself (the one that your files are being backed up to). Another way is to encrypt your files before sending them to a remote server (or other hard drive, whatever you happen to be backing up to). We’ll cover these methods in later articles.
Whatever options and features you choose, rsync proves to be one of the most efficient and versatile backup tools to date, and even a simple rsync script can save you from losing your data.