Primary image for Detecting File Moves & Renames with Rsync

Detecting File Moves & Renames with Rsync

Without patching, the rsync utility lacks support to detect when a file was renamed/moved across multiple directories inside the synced tree. There is a ––fuzzy option to save bandwidth by building upon similar files on the target side, but only in the same directory.

You may need to synchronize the large file tree over a slow connection when you’ve done a big reorganization since the last rsync run. A real world example: Joe stores multiple GiBs of family photos and videos at home and periodically backs them up to a remote server.

$ rsync -avHP --delete-after ~/family/Photos remotebox:backups
One day, Joe decides he used the wrong directory layout or the file naming scheme and shuffles these gigabytes under a totally different directory structure, a quick local operation. Unfortunately, there is no apparent safe and quick way to mirror these changes to the remote backup disk without either manual labor or waiting for the entire tree to be transferred again (provided you have remote space to do that). Moreover, the synchronization method should be aware of hard-links that may exist in the tree. Actually there is a trick to do this even without adding support for detecting renames to rsync (there are patches). The only requirement is that both the local and remote systems support hard-links. Start by doing the usual synchronization of the tree (same as above):
$ rsync -avHP --delete-after ~/family/Photos remotebox:backups
followed by:
$ cd ~/family
$ cp -rlp Photos Photos-work
The cp is done very quickly when its switches are: copy directories *R*ecursively + *L*ink files instead of copying + *P*reserve mode, ownership and timestamps (for non-hardlinked content such as directories) Do the reorganization in the Photos-work directory: you can rename, move, add and delete any files. But DON’T TOUCH the tree in Photos, this directory (with the same sets of paths on both machines), will allow rsync to quickly find the data to clone under Photos-work on the remote machine. When you’re done reorganizing, you run this:
$ rsync -avHP --delete-after --no-inc-recursive ~/family/Photos ~/family/Photos-work remotebox:backups
  • As an rsync expert you are surely aware that slashes at the end of the rsync paths have strict meaning. If not, consult the manpage.
  • You may want to run it with the safety -n switch first to see what would happen. You will see =>’s to mark the hard-linking.
By transferring both trees at once and by turning off incremental recursion, rsync collects all hard-links before it transfers anything. It is now able to reconstruct Photos-work on the remote maching IN SECONDS. Next you finalize by:
$ mv Photos Photos-OLD
$ mv Photos-work Photos
And you do this on both local and remote machines. You can keep the OLD directory around for as long as you want, the space it uses is usually negligible. Let us know when this tip was useful for you.
Vláďa Macek

About the author

Vláďa Macek

During his software engineering studies at Czech Technical University in Prague, Vlada’s love of things simple, straightforward and elegant attracted him to UNIX and led him towards a career as a full-time system administrator. Upon discovering …

Vláďa is no longer active with Lincoln Loop.