What's behind Subversion's dominance?

As Jeffrey Hammond, of Forrester, blogged recently, Subversion is the overwhelmingly predominant version control system today. Yet if you read the open-source blogs and discussions, you see loads of growth in the "Distributed Version Control Systems" (DVCS), such as git, Mercurial, Bazaar, and the like. What's going on here?

There's an interesting even just now in the Linux kernel that sheds some light. This week, according to Greg Kroah-Hartman, some Google-contributed, Android-related code was removed from the Linux kernel. The reason: the code in question is both incomplete on its own, and specific to the Android device. Either of these properties makes it constitute a "fork," and in this case there seemed no prospect of ever healing the fork.

Now, you might think a "fork" is something that happens to the code, and certainly that's part of it. But as Kroah-Hartman points out, code-forking happens all the time. The problem here is that "no one cared about the code, so it was removed." This doesn't mean no one cares about the code that's going into Android–certainly not! It means two other things:

  • No one was taking care of the code that had been contributed into the Linux kernel tree
  • No one was consuming, using, extending, or applying the code for new purposes

It had been, as the phrase is, "thrown over the wall," and left.

That–abandonment–is a big problem for open-source code, both in that it's harmful and wasteful, and also in that it's common. But it's a cultural problem, not a technical one: there's no claim that this was "bad" code, just that it was a dead end.

Open-source communities are actually pretty robust against this sort of harm. This event illustrates one such defense: if it's dead, dump it. Another commonly applied defense is simply to let projects die when no one cares: the notorious numbers of open-source projects that never got anywhere are actually signs of health: the open-source community has a powerful defense called "cheap failure," that allows ideas to start out sounding good, then peter out harmlessly. One of the cool things about DVCS is that it's very easy to start a fork, very easy to reintegrate it if you choose to–and very easy to chop it off and toss it away if it ends up a dead end, as we see here.

But if we recast this whole experience into an enterprise context, it begins to look very different indeed. The essential difference is that failure within an enterprise is never "cheap." If the Linux kernel community, including Google/Android, were an enterprise, then all the work that has been spent on this fork would have come out of corporate wallet, and the decision to toss it would be a recognition of wasted resources: salaries, equipment, time, plans, marketing, process–endless kinds of costs, sunk with no payback.

The great contrast between the low cost of open-source failure, and the high cost of enterprise failure has far-reaching effects, even down to details like tool choice.

  • An open-source project can usefully tell prospective contributors "clone the tree, work up your idea, and get back to us when you have something to show," because there's no cost to the project if the idea never bears fruit. An enterprise, however, does have such costs, and at the least has little incentive to allow that sort of work style.
  • An open-source project can allow parallel, semi-independent, persistent forks, such as the several Linux distributions: as long as there are enough people (whether sponsored or volunteer) interested in working the fork, as long as things stay controlled enough that work can be merged back and forth, there's no problem. But an enterprise may need to match its centralized budgeting and resource management with some centralized supervision, planning, marketing, and management: making it easy to disappear from the radar is not a good thing.

And that, I think, is at least part of why the centralized Subversion model remains so strong.

Posted in Subversion
10 comments on “What's behind Subversion's dominance?
  1. Guy Martin says:

    Interesting post Jack – I do wonder though how the concept of ‘innersourcing’ plays into this?
    Certainly, there are plenty of enterprise shops that will use traditional heavyweight, top-down development processes. However, isn’t there a large movement to utilize more ‘Agile’-style development processes, where failure *is* cheap and/or easy to remedy?
    I think there can still be a use case for centralized SCM there. My biggest beef with DVCS is that it requires the kind of project management discipline that successful projects like the Linux kernel have, and that is not easy to replicate within corporate environments.
    Also, DVCS gives (IMHO) the developer too much rope to hang themselves with in terms of ‘siloing’ their code until it is ‘just right’, instead of forcing them to play nicely with the rest of the development team (this becomes a community issue). It also makes continuous integration (also another good dev practice) harder to deal with.
    Heck, maybe we need you to write another blog post on distributed vs. centralized SCM from a cultural perspective. šŸ™‚

  2. The implications of all this for innersourcing are subtle, and I was saving that for another post.
    First, as to the cost saving of Agile process: this is real and important, but I dont think it changes the fundamental balance in this situation. Agile process is a significant way to reduct the costs sunk in failure, but not in the same sense as open-source does. Agile has a fail early effect, and thats a cost reduction, but its real impact on the cost is that you spend your investments wisely: features and frills fail early, but its rare that the Agilista will scrub the whole project: in the enterprise, youve definitely decided you need a new web site, or a new widget, or whatever. Those requirements come from fundamental business and market needs that arent subject to Agile pruning. Agile process reduces the sunk cost, but never to zero. Open-source cheap failure is much closer to zero-cost, and includes a very high incidence of complete abandonment of the project.
    What DVCS really adds is cheap forks, and thats not an Agile idea (Agile is about doing the right thing, not doing everything imaginable and hoping something works), not an enterprise idea (enterprise is about doing the best we can afford).
    Similarly, innersourcing is a way of spreading costs throughout the enterprise, and holds the possibility of finding spare cycles among the consuming groups that approach zero-cost contribution. But it all still comes out of one companys payroll, and the community size is never enough to completely conceal the costs. Well, maybe if you have the whole DoD to play with … but were not all that fortunate, Guy! So central management remains critical for enterprise work, and cheap forking remains inefficient, and a little scary.
    The bottom line, I guess, is this: if youre paying the bills one way or another, as is true within an enterprise, then you likely have a need to monitor centrally, and a central system is a useful management tool. If, on the other hand, you have significant hope of volunteers (or corporate contributors with independent budgets), as in the highest-profile open source projects, then decentralization might be a useful way to encourage it. Really large enterprises might have a sufficient ecosystem to approach the really large open-source dynamics (although with the rise of compliance requirements like Sarbanes-Oxley, its getting harder all the time); most (smaller) open-source projects are somewhere in the gray area where personal preference might just as well rule.
    As for requiring appropriate project management to match the tools … well, thats true of any process/tooling shift. You know … new wine, old wineskins?

  3. Trent Fisher says:

    Recently my VP asked me to look at Mercurial, and my conclusion was that DVCS don’t make as much sense in the enterprise setting for a different reason than what you mention. In the enterprise there is a level of trust amongst all the engineers and often a similar level of access to the VCS, so they can easily create branches without having to ask for access to the repository (as you always have to do with your typical open-source project). In our case, we keep the release and integration branches locked up, but everyone is free to create branches whenever and wherever they want, which seems like the key selling point of DVCS. Given that, I didn’t see the point to DVCS (for us).

  4. To restate your point a bit: open-source work needs a degree of community management not necessary in the enterprise. Is that close? I put it that way in order to lead up to another point: DVCS seems to encourage communities to neglect their community development.
    In the central-VC open-source world, theres a watchword Patches welcome! When used as a response to someone complaining about a missing feature, it means both more and less than it appears. More, because if the complainer goes on to actually provide a decent patch for the problem, then youve won another contributor. But less, in that the most probable response is to walk away grumbling. Its a put up or shut up kind of response, and its part of the process of developing users into contributors.
    Theres a similarly popular response in a DVCS community, but it means something rather different. Clone the tree and do it yourself is an easy response to end a debate going nowhere, but it lacks the implicit promise to review the result, should it appear. Of course, the community still *can* review a contribution made in this way–thats what DVCS is all about, the reason Linus gives for inventing Git. But it also can mean go ahead and make your changes, theyll just sink below the flood of private forks. The technical details of DVCS make a small but not insignificant contribution to this altered dynamic; the culture thats growing up around them looks a bit more to blame. But whatever the roots, one net benefit of switching to a DVCS is a lessened expectation of community development. Im not worried about Linux or JBoss or Python or any of the other high-profile projects–they have excellent community leadership in place, and well-established history and culture. But for the broad sweep of open-source projects, projects run by folks with less rooting in community development, this worries me.

  5. Trent Fisher says:

    True the “community management” is fundamentally different in the Enterprise, i.e. it is implicit in the management structure, etc.
    However, I was making a slightly different point, or, rather, I was addressing a different point in the workflow.
    Suppose a person sees a problem with a product and wants to make a fix. In the Enterprise environment, they probably already have source control access and can just create a branch. But in the open source world they would need to ask for permission to create a branch (as I had to with p42svn recently). Barring that, they can create a patch and then the maintainer would have to work out which version to patch. But if the person had simply been able to create a branch, the maintainer would simply have to merge their changes. Having the patch in the same system allows ancestry to be tracked and merges can be done with less effort. As far as I can tell, DVCS is intended to fix this problem.
    But creating branches is only half the problem. You have to have a plan for pulling them back together, which is, I believe, your point. Without “community leadership” in place, the whole thing would, indeed, become a mess of forks/branches. In the enterprise the “community leadership” is explicitly embodied by a project/program manager who sees to it that all the fixes get included in the appropriate release.

  6. Andreas Krey says:

    It’s funny how the ‘svn is still useful’ always ride on points that actually aren’t different.
    A fork is nothing else but a branch. The only difference is that it is carried in another repository, and yet, in DVCSes, it can easily be merged back when the need arises. In the enterprise setting the only thing required to do a fork is the permission to create a branch. And if you don’t have that and yet you need to do it, the answer is git-svn. Or tar & patch.
    The centralized model isn’t exclusive to subversion; you can just as well use git in that way. As long as there is no trouble with the central server approach, svn is basically as capable as git, except when it comes to merging. (It still hasn’t quite caught up with CVSNT in that regard.) However, once you are a second (of round trip time) away from your server, you really start to appreciate the fact that you *can* do everything locally in your typical DVCS.
    As I see it, much of the pro-subversion arguments go along the lines “you *can’t* do a lot of things which is good for an enterprise setting”. Unfortunately, what bad users do instead is none the better: Not committing changes at all for weeks instead of at least having a local commit history. Or the quite popular way of git-svnning…but if we do that, why not go all the way?

  7. I think you missed my point: its not that one *cant* operation in a centralized manner using a DVCS tool–indeed, most such projects do, to some degree. The concern raised by some is that you *neednt*, that centralization depends on the cooperation and memory of the developers at the edges. In the enterprise context I was describing, the shippable product often ships out of a point in the hierarchy a level or two below the top / center. If implemented in an DVCS, this means a separate push up-tree is needed. Pushing changes all the way to the top / center is not technically necessary to release the product, so theres no functional reminder to do it. If developers, project leads, and even managers forget, get too busy, or resist, then the delivery to the central repo might not happen.

  8. Andreas Krey says:

    I still don’t think that it is a point. On the one hand side you could make your release scripts check that the local copy is actually present in the ‘blessed’ central repository (fetch and look).
    On the other hand side there isn’t necessarily a functional check whether the sandbox you release out of is even clean and does not contain uncommitted files copied over from somewhere else.
    It is simply a matter of culture and/or release procedures. As a case in point I have seen a set of build scripts in svn that simply take the version number for the release out of a buld.properties file instead of checking whether they are executed in a sandbox that is (a) clean and (b) checked out from a tag. It happens often enough that the tagged build.properties does not match the svn ‘tag’. In contrast, similar build scripts under git actually use ‘git describe’ to get the tag info, meaning that the tag needs to be there (but not yet, alas, present in the ‘blessed’ repository).

  9. Andreas – At the risk of repetition: your point, as I understand it, is that developers *can* use a DVCS in ways that provide the benefits I claim for central systems. I wasnt challenging that. My point was that they *might* *not*, which concerns enterprises, both because of the complexities of large organizations (such as conflicting goals), and because of the magnified risk in large scale (if each product team only makes such an error once a year, and there are 1000 product groups in the enterprise, how often does the error happen?). And, what particularly concerns me about the current crop of DVCS-based community sites: they seem *not* to be taking these measures, they seem rather to be concentrating on the liberties DVCS grants, with the consequence that they lose the characteristic benefits of central systems.

  10. Andreas Krey says:

    My point is not (just) that DVCS can be used in a centralized way (read: you’re only getting money when I find the release tag in the ‘blessed’ repo) and may not by pushing directly from developer to release builder, but that also svn can be subverted in similar (but uglier) ways. You need the discipline to have releases tagged in a well-known place in either case.
    The absolutely cool thing about DVCS is that you don’t lose version control coverage when the central point is unavailable for any reason up to server outage. Copying files between svn sandboxes is outright dangerous, pulling between repos isn’t. This is, depending on the context, good or bad. It makes circumventing the central point less dangerous and thus more probable.
    And what community sites do is pretty irrelevant to what we do in the enterprise. Beside, Sturgeon’s Law seems to apply in both cases. I wish I’d see the review quality at work that I see on the git mailing list.
    Another cultural thing I see in selected places is that people actually start to care about what goes into individual commits instead of just doing the occasional global commit, and I consider the relative quality of the git command line tools a major factor in this. You *can* do that in svn, but it’s a lot more of manual shuffling, and so people are a lot less likely to do it.
    The relative usability of what comes with svn vs. with git makes me so much more productive that I hope my PHB will not fall to your ‘central is good, so you must enforce it by baking it into the infrastructure’ point.

Leave a Reply

Your email address will not be published. Required fields are marked *