Sparse Directories in Subversion 1.5

The last few weeks we blogged a lot about the Merge Tracking feature in Subversion 1.5.  Of course there are several other great new features coming. Let’s look at what else is new.

The Subversion 1.5 release notes (which are not final of course) mention these new features for 1.5:

  • Merge Tracking
  • Sparse checkouts
  • WebDAV transparent write-through proxy
  • Cyrus SASL support for ra_svn and svnserve
  • Copy/move improvements: peg revisions, ‘svn mv file1 file2; svn mv file2 file3’, ‘svn cp *.c dir’
  • Cancellation improvements
  • Changelist support
  • FSFS sharding
  • Command-line client improvements
  • JavaHL bindings improvements
  • Many improved APIs

If you are new to this blog and want to find out more about Merge Tracking, check out our Merge Tracking Early Adopter Program. Over the next few weeks we’ll blog about some of the other new Subversion features. Not about everything though, other people are blogging about 1.5 as well and we’ll link to them. For example: Malcolm Rowe blogged about mod_dav_svn improvements, tree-structured FSFS repositories and backing up FSFS repositories, Subversion 1.5 style.

In this post we’ll talk about sparse directories.

Sparse Directories

When you first checkout a Subversion repository, or a directory within that repository, you get the whole directory with everything underneath it. In large projects that can be a problem because the files are copied over the network and that can take some time. This is very contrary to how Subversion subsequently sends small deltas over the network with minimal use of network resources. Also, do you want all these files cluttering your disk?

Subversion 1.5 introduces Sparse Directories, giving you more control over what to checkout and how svn update works. You can read the specs here. But I don’t learn from reading, I need to play. So let’s play (but let’s be mindful of the fact that SVN 1.5 is not feature complete yet).

I started with downloading the Subversion 1.5 pre-release binaries from the Merge Tracking Early Adopter Program and setup a repository:

  • Download the binaries (Windows in my case).
  • Copy .exe and .dll files to c:svn and add c:svn to %PATH%.
  • Create repository (svnadmin create repo) at c:svn (repository is c:svnrepo).
    Download the repository dump file that comes with the Merge Tracking beta.
  • Load the dumpfile (svnadmin load c:svnrepo < c:svnmergetracking.dump).
  • Create directory for working copy.
  • Checkout the repository (svn checkout file:///c:/svn/repo/trunk)

The main directory of the trunk of the repository contains one file and a few sub-directories:

index.html
about
jobs
news
products
support

Suppose I’m another developer (I mimicked that with working copy wc2) and I have no need for the subdirectories. With current releases of SVN, I can use the -N switch (for: non-recursive) to checkout only the main directory of the trunk and not the sub-directories:

I now have the topmost directory of trunk and the file in it, but not the sub-directories. For a big repository, this will reduce the amount of time for the checkout, and keep your working copy cleaner and subsequent updates will only pull in files added to the directory you checked out. But –N is all the control you have.

SVN 1.5 will be more flexible by making the -N switch redundant and replacing it if for a –depth option. According to the spec, the possible values for –depth are:

  • –depth=empty: Updates will not pull in any files or subdirectories not already present.
  • –depth=files: Updates will pull in any files not already present, but not subdirectories.
  • –depth=immediates:  Updates will pull in any files or subdirectories not already present; the subdirectories will have depth=empty.
  • –depth=infinity:  Updates will pull in any files or subdirectories not already present; the subdirectories will have depth=infinity.  Equivalent to today’s default update behavior.

Now I am developer number 3 with wc3 as working copy. Let’s checkout the repository with -–depth=empty:


A trunk directory was added to wc3 as well as the .svn administrative directory (in other words: a real working copy was created) but other than that, the trunk is empty.

I added a file (test1.txt) and a subdir to the repository using wc2. According the spec, if I now do svn update, no files should be added to wc3:


Well, no surprises: no files.

Let’s create a file in wc3, add and commit it (test2.txt). Then back to wc1, svn update and make some changes to it. Also add a file test3.txt, commit. Now update wc3.


Test2.txt has the change, but no test3.txt was added to wc3 (remember: –depth=empty means svn update will only pull in files or subdirectories already present).

Now let’s play with –depth=files.


All files from the top directory of the trunk are pulled in, but not the sub-directories. Now I added test4.txt and test5.txt to the trunk of wc1 and added a directory with a file in it, then committed.


An svn update on wc4 pulls in the test4.txt files and test5.txt but not the sub-directory and the file in it.

At this point I had a question. Suppose I only want test4.txt. I’d like an option to just pull in test4.txt and not get test5.txt as well. Does Subversion 1.5 support that?


Nice!

Let’s play with –depth=immediates. I guess “immediates” means “neighbors”: if you are a file in the same directory where I am, you are a neighbor, if you are a subdirectory in my directory then you are a neighbor too, but if you are a file or directory inside a subdirectory, then you are not an immediate neighbor. Let’s try that.

I created wc5 with svn checkout file:///c:/svn/repo/trunk –depth=immediates

It checks out the files in trunk, the subdirs but not the files in these subdirs. Next, in wc1 I added test6.txt to the highest directory of the trunk, a subdir and test7.txt in that subdir.


The new file in the main directory of the trunk and the subdirectory come in, but the file in the subdirectory does not, that directory is in fact created at –depth=empty.

The last value of –depth is “infinity”

svn checkout file:///c:/svn/repo/trunk –depth=infinity

It is the same as svn checkout file:///c:/svn/repo/trunk. So, why have it? Consistency between the commands. The –depth option does not only apply to checkout but also to update. For instance, if you have checked out a directory at –depth=empty you can still update it with everything that is in the repository by using:

svn update –depth=infinity


Oh, so that did not work. Well, the beta of SVN 1.5 from the Merge Tacking Early Adopter Program is a few weeks old and apparently updating at deeper depths is fixed by now (we’ll update the beta soon).

This did work though in the build I used:


The other way around also works. I created wc7 and did a checkout with depth infinity and then used wc1 to commit a file and directory with file in it (you know the drill by now). Updating wc7 like this:

svn update –depth=immediates

includes the file and the subdir (the immediates), but not the file in that subdir.

There are a couple of other commands affected by sparse directories:

  • svn info will list the depth of the working copy.
  • svn switch can take the –depth option and updates the url that the working copy points to with the depth you specify.
  • svn status will only report the status of files and directories within the depth specified. E.g.: if you create a subdir and a file in it, then svn status –depth=immediates only reports on the new directory (a neighbor), not the file in the subdir.
  • SVN help commit tells me that commits will also take –depth and allow you to e.g. only commit files in the current directory: svn commit –depth=files. It does not work yet but that is a defect.

Let me close with a note on compatibility. Subversion client and server release levels could be an issue with sparse directory support. However, I understand that by the time Subversion 1.5 releases the client will be clever enough to figure out what to ignore. In other words, if the server is pre-1.5 it will throw the whole kitchen sink with all the dirty dishes at the client but the client will determine what it needs for the checkout. Sparse directory support will work just fine.

What’s a use case for sparse directories?  Checking out a single file (with –depth=empty and then doing an update on the one file you want) will certainly be useful for many people.  And here is a use case that I will use. We use Subversion to manage the html content on openCollabNet. The pages are distributed over a number of repositories on a single the server (each project has its own repository). Within a repository I typically manage the top directory and might give subdirs to others, for instance to store binaries for download, articles or other stuff. By checking out the repositories with –depth=files, I’ll only get what I need, not the files that other people manage in their subdirectories.

Trivia: when were sparse directories first discussed?
Go here for the answer.

Tagged with: , , , , , , , , ,
Posted in Subversion
8 comments on “Sparse Directories in Subversion 1.5
  1. Matt Doar says:

    Nice summary, thanks. Two questions:
    1. Can I check out selected subdirectories as well as files? E.g
    I get the top-level directories and files using
    svn co –depth=files
    and then check out just the two subdirectories I want with
    cd wc
    svn update subdirA subdirB
    but subdirs C, D, E, .. never appear
    and as a bonus, is there any way to do that operation as a single svn command?
    2. Any plans for abbreviations, e.g. svn co –depth=i as an abbreviation for svn co –depth=immediate

  2. Yes, when you checkout with –depth=files you can later get directories with svn update subdirA subdirB
    I don’t know what the plans are with regards to abbreviations.
    Best,
    Guido

  3. Mika says:

    Hi,
    First of all, thanks for the article …
    One thing that caught my eye; “Checking out a single file (with –depth=empty …”.
    This far, I’ve gathered that checking out a single file just is not possible, am I wrong or right?
    Rgds,
    Mika

  4. Jens Troeger says:

    Nice article, thanks. Some questions about using a sparse working copy for merging:
    Is the merge command able to deal with such working copy? Does it automatically pull missing files or directories as needed? Do I need to (fully) update my working copy prior to merge to ensure that folders are available for potential merge candidates?
    Best regards,
    Jens

  5. Best practices for merge require the following:
    1) A complete working copy (non-sparse)
    2) No switched children
    3) No local modifications
    4) A single revision (svn up prior to merge)
    The svn merge –reintegrate option will actually check all of these and a few others and error out if any of them are not true. A “normal” svn merge can handle sparse working copies, but it will cause additional merge tracking information to be created to tell SVN that an incomplete merge was performed. This information will prevent you from being able to use the merge –reintegrate option later on if you want to merge the branch back to its source.
    Mark

  6. It’s nice that Subversion added this feature in 1.5 — comparing Subversion to Perforce, one of the main downsides to using Subversion has been its lack of anything comparable to Perforce’s “clientspec” feature.
    Comparing Subversion sparse checkouts to Perforce clientspecs, Subversion still hasn’t quite caught up to Perforce. Perforce clientspecs give more powerful control over which files to include and exclude, a “p4 sync” after a clientspec change will add and delete all of the necessary files, and once you’ve set up a good clientspec, other people can set up a similar clientspec via copy and paste, or by cloning from yours as a template.
    Unfortunately, both Subversion sparse checkouts and Perforce clientspecs suffer from one key deficiency: they require you to manually select which files you need. This becomes increasingly impractical as projects grow larger. For example, on large Perforce-based projects I’ve worked on, occasionally it was necessary to add a new directory to everyone’s clientspec working on the project. This is difficult, to say the least, on a 500-person project — you can send out a spam email to the team, but some people won’t get the email, and others will get it but it will be buried behind 1000 other emails and will never be seen. Then you get occasional reports of strange build errors for the next month or two due to the missing files…
    A better approach is to integrate the version control system with the file system, as is done in Cascade: http://www.conifersystems.com/cascade/. This gives “perfect” sparse checkouts without any need for manual user selection of which files are needed.

  7. Kindly provide your suggestion to achieve my below requirement.
    1. I am using TortoiseSVN 1.4.0, Build 7501 – 32 Bit client for windows and my server svn, version 1.4.2 (r22196) compiled Dec 6 2008, 17:07:19 is running in linux. I want to download a folder (with out subdirectories) using my client for windows. At the same time, if I specify the entire path, then it should download that directory and it subdirectories.
    Kindly advice in this regard.
    Thanks.
    Piramanayagam M

Leave a Reply

Your email address will not be published. Required fields are marked *

*