Mirror Management: How Often Should I Sync?

A CollabNet customer submitted the following question to us recently:

We set up mirrors of several Subversion repositories which CollabNet hosts for us. I have a question regarding the frequency of synchronization. How often should it be done? Should we synchronize often with a smaller number of changes, or only once in a while with larger amount?

As it turns out, I’d been considering this very question myself recently. The dynamics of mirror management in Subversion are interesting, so fielding this question gave me an opportunity to render some of my recent musings on the matter as text. And as there is nothing particularly unique about this customer’s Subversion deployment scenario, you the reader get to benefit from the generality and (now) publicity of the response that I offered to the inquirer.

My response was as follows:

This is an interesting question, and one I’ve been chewing on myself
lately. The answer I provide may not be satisfactory if you’re looking
for a simple "You need to sync every X minutes" type of response. The
answer is instead tied somewhat to the balance between your intended
purposes for the mirrors and the level of complexity you’re willing to endure while maintaining them.

Let’s start by examining the naive approach to synchronization, where
you fire off svnsync every so many minutes. Depending on how you wish
to use the mirrors, the exact number of minutes may vary. For a simple
nightly backup job, 60 x 24 = 1440 minutes works fine. But for a mirror
perhaps used by developers trying to stay atop the state of a rapidly
changing codebase, that’s not often enough. You might need to sync
every twenty minutes. Or five. Or one. You don’t want to poll the
original repository so often that you affect that server’s performance,
of course. (Launching denial-of-service (DoS) attacks against yourself
is not considered wise.) But the cost of attempting to sync an already
up to date mirror isn’t all that great.

Now, if versioned changes were the only bits of data maintained by
this synchronization task, the choice of how often to run svnsync
sync
would be just as simple as the above. Unfortunately, there are
also the unversioned revision properties to pay attention to as well.
Because you can change a revision property at any time, and because
Subversion doesn’t record when you did so, complete synchronization of
Subversion repositories gets complicated. Say you have 100 revisions in
your master repository, and you’ve just caught your mirror up to date,
too. Later, one of the developers changes the log message for revision
50. svnsync will never realize that this change happened. Future
svnsync sync invocations will of course continue to pull down new
revisions that have been added (r101 and later), but the log message
for r50 in the original and the mirror will not be the same. The
svnsync copy-revprops subcommand is the tool for remedying this
discrepancy, but something has to tell that subcommand to run, and
against which revisions to do its thing.

So the revision property synchronization angle on this adds
complexity. Most of the time, developers quickly realize mistakes
made in log messages, and fix them relatively soon after the commit
completes. As long as they make the fix before your sync job pulls down
that revision, all is well in the mirror. So that makes an argument for
doing synchronization less often (to allow time for post-facto log
message touch-ups). But how long is long enough? What about those cases
where somebody changes log messages on revisions committed months ago?
These questions can’t be answered without again looking to the purpose
of the mirrors. In your situation, does it matter if the revision
properties are out of sync so long as the core file/directory versioned
data is up to date? Maybe not. Maybe it’s okay if the revision
properties deviate for longer periods of time. Maybe in your situation,
a revision sync every ten minutes plus a nightly revision property sync
for all revisions in the repository is just what the sysadmin ordered.

(As promised, I’ve probably raised more questions than provided answers here.)

In my opinion, the best approach is a multi-faceted one, a
combination of real-time event-based triggering of sync actions and
scheduled just-in-case full synchronization jobs.

The first part of this is the part that, barring communication
errors between the master repository server and the servers housing the
mirrors, keeps those mirrors as up-to-date as possible. Ideally here,
your primary repository is able to push changes to your mirror(s), or
at least push notifications of changes to them. For example, your
primary repository might have post-commit and post-revprop-change hooks
that run svnsync to update the mirrors directly. Of, if that’s not
possible for reasons of firewalls and security and such, then perhaps
those post-commit and post-revprop-change hooks at least send email
notifications of changes, and the mirror machines have some automated
way of noticing those mails and triggering the relevant sync tasks. A
commit mail translates to running svnsync sync; a propchange mail to
running svnsync copy-revprops for the revision whose property was
changed.

The second facet covers the what-if cases. What if the mirror
machines didn’t get some of those email notifications? What if the sync
jobs themselves suffered network outages? To address this, you might
want to have some kind of regular scheduled task that attempts svnsync
sync
(usually finding nothing to sync, because the event-based sync
triggers are working just fine), and also does svnsync copy-revprops
across ranges of revisions (usually rewriting the mirror’s revision
properties with the values they already had, for the same reason). Of
course the thing to avoid is any given svnsync job taking so long as
to cause contention with other svnsync jobs operating against the same
repository.

While not outright instructive, I hope this has been informative
enough for you to decide which implementation works best for you.

C. Michael Pilato

C. Michael Pilato is a core Subversion developer, co-author of Version Control With Subversion (O'Reilly Media), and the primary maintainer of ViewVC. He works remotely from his home state of North Carolina as a software engineer for CollabNet, and has been an active open source developer since early 2001. Mike is a proud husband and father who loves traveling, soccer, spending quality time with his family, and any combination of those things. He also enjoys composing and performing music, and harbors not-so-secret fantasies of rock stardom. Mike has a degree in computer science and mathematics from the University of North Carolina at Charlotte.

Tagged with: , , , , , , , , , , , , , ,
Posted in Subversion
7 comments on “Mirror Management: How Often Should I Sync?
  1. Alex says:

    > Of course the thing to avoid is any given svnsync job taking so long as to cause contention with other svnsync jobs operating against the same repository.
    Hi,
    this is exactly what I was wondering about with my every 5 minute svnsync cron. If n-1 svnsync is still running due a very large commit, how will the next svnsync react?
    – immediately exits with error (or no error) (this is what I would expect) ?
    – wait until the previous svnsync has left (-> this will forward the issue of the following svnsync calls) ?
    – wrek the mirror ?
    Thanks,

  2. svnsync uses custom properties, stored on r0 of the mirror repository, for bookkeeping. One of those properties is svn:sync-lock, and it exists to prevent multiple svnsync jobs from clobbering each other. Your sync of rN will see the property set on the repository by the sync of rN-1, will enter a retry loop for 10 seconds or so to give that lock some time to be cleared, but failing that will just error out with “Failed to get lock on destination repos, currently held by ‘??'”. Of course, that still leaves your sync out of date until the next commit comes in. But adding a regular cron-driven sync to the mix would reduce that out-of-dateness window.
    As it turns out, just last night I documented svnsync’s custom properties for the Version Control with Subversion book. You can find these notes in this section: http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.replication

  3. gbjbaanb says:

    Wouldn’t we potentially get issues if we changing a revprop on a just-committed revision regardless of whether you use a cron svnsync run, or a post-commit hook to run svnsync.
    In one case, the revprop sync will be blocked by the post-commit hook’s sync and will fail. As you don’t have any ability to replay the revprop change then it will be permanently missing from the backup repository. In the case of the cron-based sync, the revision may not be copied yet so the revprop sync will fail (I think! I’d have to test this statement)
    So, I guess the solution is still to use the post-revprop-change hook, but to defer the sync. If the url and revision was written to a file, then you can use that to sync the revprops after the regular sync completes and all would be well (and syncing would be in the background, so commits would be more responsive to the user)

  4. Fabien says:

    Hi,
    The most elegant way I found to synchronize in real time is to create a post-commit hook that appends a line in a text file. On their side, the mirror servers do a ssh @ “tail -F ” | while read myentry; do svnsync sync ; done
    If ever there is a network outage, the ssh process dies and so does the script. I have then a cron job running every 10 minutes that nows the pid of the script and that checks if it still exists.
    Of course you need a remote shell access to the main server.

  5. Fabien says:

    Sorry for the tagging issue. It should read: ssh login@mainserver “tail -F filename” | while read myentry; do svnsync sync path_to_repos; done

  6. jb says:

    Mike,
    There is a potential issue with using cron to svnsync to the mirrors.
    If you have a remote office using its own local PTP mirror which the master is syncing every X mins using a cron, then checkins from the remote office cannot be any more frequent than 1 every X mins – this can be a serious problem if it is a large development team.
    The reason is follows:
    – remote users checkout their WCs from their local mirror
    – they make their changes and then svn up to update their WC before checking in
    – the problem is that the svn up does not necessarily update WCs to the latest rev, since there is a lag of up to X mins between a checkin to the master, and a sync of that checkin to the mirror
    – attempting to checkin via an out of date proxy will fail with a “not latest baseline” error
    Users will keep retrying until successful, i.e. until the sync happens, thereby locking out the mirror for checkins until the next sync, X mins later. At busy times, I have seen this go on for hours with developers getting ever more frustrated. Even with syncing every minute (X=1) some users complained that it could take up to half an hour, retrying a commit over and over, to catch the brief window.
    Real-time sync from post-commit hook is the only realistic option for any sizable remote development.
    (The only alternative I can think of is to switch your WC to the master before your last svn up and checkin, but this is only possible if the master and mirror are set up with the same UUID, and in any case is a bit of a hassle, effectively by-passing the convenient svn PTP feature.)

  7. jb, you are, of course, correct. Delaying synchronization vastly
    hinders any collaboration which depends on up-to-date mirrors. I think
    the specific situation you mention — only being able to commit against
    up-to-date mirrors — was a bug in the Subversion client fixed in 1.6.9
    (see http://subversion.tigris.org/issues/show_bug.cgi?id=3561). But
    yes, in general, you want mirrors to be as up-to-date as possible.
    Thats why I prefer commit/revprop pushes from the master to the slaves,
    with a cron job in place really only as a fallback mechanism.

Leave a Reply

Your email address will not be published. Required fields are marked *

*