A generation question

I must admit that I’m new to blogging and these lines have been written on plain old paper first – yes, I was a COBOL developer in the early days of my professional career and I still write a lot of things in upper case.

When I made the move to a small start-up named Atria Software back in 1996, the world of software development looked quiet different: small to mid-sized teams, typically co-located around a specially designated project server. Today’s broadband connections are faster than most intranet lines then, and different sites are connected through GB-capacity leased lines? Back then, if you were lucky you had a doubled ISDN dial-up connection.

Back in these days companies started to ramp-up distributed teams to work on highly complex projects, and of course they needed an infrastructure that supported this. The answer of most commercial SCM systems was Replication and Synchronization of the repositories. You duplicate the archive, figure out the delta at the different sites from time to time, you exchange these deltas and apply it to the different replicas – done! The biggest advantage of this approach was that it worked with the thin connections. The biggest disadvantage: conflicts due to the time gap between change and sync. If the same object is changed at 2 sites, a potential conflict has to be resolved during the synchronization. You might argue that this still can be solved by mastership concepts to avoid that the same object is changed simultaneously. You are right, but this implies that you are introducing branches not because the project needs them, but just because of the replication.

Modern approaches try to solve this challenge like Raid systems with write-through technology. If you commit a change to an archive, the change is propagated immediately to the other replicas, without any mastership definition. This little time gap eliminates most of the risk – almost like a central archive!

But – just "almost". First, if the connection between the replicas is down the risk for conflict rises again. Second, it reduces the flexibility of organizations to react to change, because the setup of the infrastructure reflects the status quo (if you’ve ever moved replicas between different sites you know what I mean …). Third, it makes IP (Intellectual Property) Governance more complicated. I know of a ClearCase VOB that was replicated across 28(!) sites, most of them with read-only access. Imagine the cost to establish the same level of security to avoid unauthorized access at each site?

A common argument against a centralized approach in SCM is what to do if you (or your site) are offline and you need data that is not in your working copy right now. Well, in today’s world there is no longer the need for being offline (you can even go online in an airplane these days); the technology is there and it is way cheaper than bothering a large developer team with a highly complex branching strategy. And if you want to isolate work for a while before sharing it with the other project team members, Subversion (or an embedded Subversion) allows you to do that as well.

Don’t get me wrong, there are still some scenarios for replication, like an extreme data volume. Then, and only then, it is the right approach. But don’t let it be your silver bullet to meet the challenge of distributed development and do it because you always did it!

I must admit I have my fair share in seeding the concept of replication into organizations. But – replication as an SCM concept to support distributed teams is more than 15 years old; in our market space this is almost 2 generations of technology. It is time to try something new, although it is not that new at all because the mainframers have done it for least for 5 generations …

This is posted by an excited NKOTB (New Kid on the Blog) coming from the pre-internet generation.

Tagged with: , , , , , , , , , , , , , , ,
Posted in Subversion
2 comments on “A generation question
  1. Jean says:

    Hi,
    I would think that replication is coming back in force these days though. Maybe not (yet?) in the “enterprise” world, but there are lots of examples of replicating SCM out there. If you have a small look at “next-gen” open source SCM, you will find darcs, gits, monotone, arch, svk (built over subversion).
    They all seem to be quite replication oriented to me, but they were also built with the replication problems in mind. They are supporting a fully decentralized paradigm.
    As part of their design they propose specific merging algorithm to minimize conflicts and track merges between branches.
    If I understand correctly the most advanced algorithm is proposed by darcs which tracks patch depedency and seems able to reorganize the dependency during a merge.
    They are not perfect but they do get close.
    So I don’t think decentralized is that much out of fashion or even out of touch, and ultra fast internet link are only making it even more usable.

  2. Rainer Heinold says:

    I agree with you that there are many activities out there around distributed CM. And the projects you’ve mentioned are solving a lot of issues that traditional CM systems had, no question about that.
    But IMPO there are still 2 questions remaining. What overhead is it to sync up in a large project with a double or triple digit number of developers? Yes, you can do a lot around the project setup to limit the necessity to sync with a lot of people to get your work done, but a central repository would solve this naturally.
    And second, at least in an enterprise context you need to bring all of the different bits together sometime in a single place to release your software.
    Again, don’t get me wrong. I think the approaches are interesting and might be right solution in specific scenarios. But the more fast broadband replaces bumpy connections, the less benefits I see for a distributed approach in a commercial development.

Leave a Reply

Your email address will not be published. Required fields are marked *

*