Packing FSFS Repositories

Subversion 1.5 introduced that idea of sharding for FSFS-backed repositories. For every commit to a FSFS repository, Subversion creates a single file which describes all the changes in that revision. Prior to 1.5, all of those files were stored in a single directory, which had several drawbacks: incremental backups took a long time, the repository could not be dymnically grown across different filesystems, and some filesystems have degraded performance when the number of directory entries grows too large. With sharding, these revision files were split into separate subdirectories, eliminating a large number of these problems.

Even with sharding, the filesystem still has some inefficiencies. For instance, due to the block size of the underlying filesystem, having many files can still lead to wasted space on disk, especially with many small commits. Subversion can open and read data from many revisions over the course of an operation, and using a large number of files means that Subversion can not exploit various operating system-level caches. Backing up and restoring a repository, although quicker, can still take a long time because of the large number of files spread across the repository.

One of the great ideas that came out of the 2008 Subversion Developers’ Summit was the notion that FSFS filesystems could be packed, that is, all the files in a completed shard could be glued together to create a single monster revision file.  This pack file would save space on disk, give the operating system a chance to do some caching, and generally improve the snappiness of the system.

In order to use FSFS packing, you simply need to ensure that target repository has been upgraded to the latest format, and then pack the repository using svnadmin.  Note that repositories do not automatically pack themselves, so for heavily used repositories, you may want to install a cron job or post commit hook to do the packing.  Users can continue to use the repository while it is being packed:

$ svnadmin upgrade repo
Repository lock acquired.
Please wait; upgrading the repository may take some time...
Upgrade completed.
$ svnadmin pack repo
Packing shard 0...done.
Packing shard 1...done.
Packing shard 2...done.
Packing shard 3...done.
Packing shard 4...done.
Packing shard 36...done.

To give an idea of the potential space savings, on my local 1.5-era copy of Subversion’s own repository I get the following results:

$ du -sh svnrepo-1.5/
659M	svnrepo-1.5/

While on a packed 1.6 copy of the same repository, with rep-sharing enabled, I see the following:

$ du -sh svnrepo-1.6/
593M	svnrepo-1.6/

That’s more than a 10% decrease in space, at no cost in performance. These space savings will vary depending upon your own repository and use habits, but we’re excited about the improvements in the FSFS backend in Subversion 1.6.

Posted in Subversion
4 comments on “Packing FSFS Repositories
  1. Simonn says:

    Great post, but its a bit long and most people like short and sweet posts!

  2. Great feature, I just updated my servers from 1.5.5 to 1.6.1 and upgraded/packed all repos. So far so good, nothing blew up. 🙂
    Two (obvious) RFEs though:
    1) Automatic packing of shards that reach the 1000th version. For the time being I’ll update my post-commit scripts, but that’s annoying. Not a critical issue, rather a “nice” item.
    2) Packing of the /db/revprops shards. These are still accumulating hundreds of thousands of TINY files (avg 150 bytes) in my poor Windows server (NTFS really doesn’t like small files)… with packing, each of these 1000 prop files would be replaced by a single ~150Kb blob.
    Any chance we can get these improvements? Should I file bugs?

  3. Hyrum Wright says:

    Glad you like the feature. A couple of comments:
    1) The shard size is configurable, so it wouldn’t be after 1000th revision, per se, but whatever the shard size is. However, packing could be a long process, depending on the size of the revisions. I’d be hesitant to unconditionally do it inside Subversion proper, but I think you’ve hit upon the right solution with a post-commit hook script.
    2) Revprops are mutable, and as such their size may change. Modifying a packed revprop would cause then entire shard to be rewritten, not just the modified value. Aside from the performance issues, this also causes race conditions when multiple revprops are being edited at the same time. All of these concerns mean that packing of revprops probably won’t happen any time soon. What might happen is a migration of revprops to a better storage mechanism, such as sqlite, though there are no current plans for that.
    That being said, if you want to more thoroughly pursue these issues, please post to the mailing list.

  4. Guti says:

    How much time it took to pack the 600 meg repository?

Leave a Reply

Your email address will not be published. Required fields are marked *