Migrating Subversion Repositories to Git – The definitive Guide for TeamForge Users

 

Many software projects are moving from a centralized version control system (CVCS) to a distributed version control system (DVCS). Undoubtedly, the most used DVCS is Git. My team at CollabNet recently migrated one of our flagship Open Source projects, the CollabNet Connector Framework, from Subversion to Git. This blog shares the step by step guide to migrate SVN repositories  to Git repositories using git svn while maintaining all version history.

This blog will cover two use cases:

1) Migration of a single project SVN repository to a single Git repository

2) Migration of a multi-project SVN repository to multiple Git repositories

In case a of single project SVN repository or a SVN repository that follows standard layout, each repository contains one project that has its own trunk, branches and tags. It looks similar to below where colour symbolize the project.

A multi-project or non standard SVN repository does not follow those rules – many projects or project components are placed under the /trunk, /branches and /tags top level directories. Some tags refer to all projects, some don’t:

A single Git repository is less suitable to host multiple projects than a single Subversion repository is. In contrast to Subversion, you cannot just checkout certain paths of your Git repository or have commands just operate on a sub directory. A Git commit is a snapshot of all files in the entire repository. Consequently, if you migrate a multi-project Subversion repo to one Git repo, many commits which only affected one project in the multi-project Subversion repository will not have any relevance for the history of any other project but will show up in their history too. Same applies to tags (you will see all migrated tags in all projects even if they do not have anything to do with a particular project).

Hence, it is recommended to split up a multi-project SVN repository to separate Git repositories (one per project). If you take this extra effort, you will be rewarded with a much cleaner commit history and appropriate tags.

Now let’s see a step by step migration of a project named Red and its complete dependencies(/trunk/branches/tags and its history) from SVN to Git repository for both a single-project SVN repository and multi-project SVN repository.

a) Fetch the list of all Subversion committers (Optional)

This step is optional in case you do not require proper committer email addresses for every contributor to the SVN repo. If you skip this step and convert a repository without an author file, the commits will belong to a fake user made of the <SVN username>@<SVN uuid>.

SVN has just the username for every commit, in case of Git, a commit author needs to have username and an email address. We would need to fetch the list of SVN committers for the repo and map it to the corresponding username/email addresses of the Git commit author. This list can then be used by git svn to transform plain SVN usernames into proper Git committers (username with email addresses).

 $svn log -q https://svnnonstandardrepo.com/svn/repos/myrepo| grep -e ‘^r’ | awk -F ‘|’ ‘{print $2}’ | sort -u | sed ‘s/^ *\(.*\) $/\1 = \1 <USER@collab.net>/g’ > svnauthors.txt

Running the above command in Git bash grabs all the log messages, extracts the usernames, eliminates any duplicate usernames, sort the usernames and place them into a “svnauthors.txt” file. The generated authors file that maps the SVN author with corresponding Git format i.e username with email address is like

skumarvs=skumarvs<user@collab.net>

Now edit each line in the file. For example, manually update the email id of the Git committer in authors file.

skumarvs=skumarvs<selvakumar@collab.net>

If your SVN/Git repository is hosted in CollabNet TeamForge; you could automate the step of getting the email address of the SVN author by using a TeamForge Command Line Interface  (CLI) script. The CLI script takes the svnauthors.txt file generated by the previous command as an input, parses all of the SVN author names and generates the authors.txt file that has the svn username mapped to Git user name and email address.  For example: skumarvs=skumarvs<selvakumar@collab.net>.

expr unlink(“authors.txt”)
output authors.txt
connect server connect to <URL> as <user> with password <passwd>
try
    do
load -properties DATA svnauthors.txt
for ITEM in $DATA
do
ctf api CollabNet->getUserData $ITEM
ctf set uname `print username`
ctf set mail `print email`
echo $uname=$uname<$mail>
  done
catch
    echo $ERROR
done
    output

b) Initialize/Clone the Git repository

The next step is to initialize the Git repo for pulling from svn.

Open Git bash and create a working directory

$mkdir redgitrepo

$cd redgitrepo

For single project SVN repository one could directly clone the SVN repo with below command. Ignore rest of the steps and go directly with step d)  to proceed further.

 git svn clone –stdlayout –authors-file=authors.txt https://svnstandardrepo.com/svn/repos/myrepo/Red  redgitrepo

If you skipped step a), you would also skip the –authors-file parameter

For a multi-project SVN repository we need to use git svn init to cherry pick the branches/tags/trunk which belong to the project to migrate.

$git svn init https://svnnonstandardrepo.com/svn/repos/myrepo -T trunk/Red -b branches/Redbugfix -b branches/Redspike -b branches/Redadminfix -t tags/Red1.0 -t tags/Red2.0 -t tags/Red2.5 –no-minimize-url

This initializes an empty Git repository in the current directory with additional metadata directories for git svn. It takes the subversion repo url with arguments for -T(trunk) -b(branches) and -t(tags) that need to be migrated.

–no-minimize-url will allow git svn to accept URLs as-is without attempting to detect a higher level directory.

$git config svn.authorsfile ../authors.txt

This tells Git about the authors file by updating the Git config svn.authorsfile property to the authors file location.Ignore this config step for authors if you have considered this as optional as per step a)

c) Fetch all files

 For a single-project SVN repository you may directly proceed with step d)

 $git svn fetch

This gets all branches and tags from Subversion you mentioned in the git svn init (see step b). At this point of time the Red project hosted in the multi-project SVN repo has been imported into the local Git repository.

This command imports the SVN branches as remote branches and imports the SVN tags as remote branches prefixed with tags. If you run git branch -r, you’ll find all of the branches and tags from your SVN  repository.

Note that git-svn doesn’t copy empty directories or files, because Git tracks the contents of files and directories rather than the files themselves.

Adding the –preserve-empty-dirs flag to the clone and fetch operations will detect empty SVN directories, and create a placeholder file within them. This allows “empty” directories to exist in the history of a Git repository.

d) Clean up the SVN references

Before pushing the repository to the Git server we should remove remote SVN references from the Git project so that these are not pushed to the server (they are no longer needed or desired).

Moving remote tags to local tags

$ git for-each-ref refs/remotes/tags | cut -d / -f 4- | grep -v @ | while read tagname; do git tag “$tagname” “tags/$tagname”; git branch -r -d “tags/$tagname”; done

Moving remote branches to local branches

$ git for-each-ref refs/remotes | cut -d / -f 3- | grep -v @ | while read branchname; do git branch “$branchname” “refs/remotes/$branchname”; git branch -r -d “$branchname”; done

There is usually no point keeping the trunk branch as now we have master

 $git branch -D trunk

e) Push to a remote Git repository

In order to push your local Git repository to a Git server, you would have to create an upstream repository there. Our Collabnet TeamForge Git Integration provides easy Git repository management from TeamForge.

In order to create a Git repository in TeamForge and import your local repository contents, make sure you are part of a TeamForge project role with source code admin permissions.

The following screenshot shows how to create a new Git repository in TeamForge

If you click on the link for the newly created repository, you will see a screenshot similar to the one below with an URL section where you will find the ssh and http clone/fetch/push URL.

If you like to clone using your public/private key pair, don’t forget to register your public key under TeamForge -> My Settings -> Authorization Keys

Now it is time to set this URL as remote origin in your local git repository you like to push to the server:

 $git remote add origin ssh://<username>@<remoteurl>/redgitrepo

Finally, we have to push all branches and tags upstream by running

 $git push -f –tags origin refs/heads/*:refs/heads/*

The -f option is necessary as your local master branch is not a fast forward of the initial commit in the just created upstream repository.

Now we have our Red project migrated with complete history from both single-project SVN and  multi-project SVN repositories.

 Next Steps

Newly migrated Git repos can be easily managed with our CollabNet TeamForge Git Integration which is powered by Gerrit, an open source Git server which enforces access  control on Git repositories and supports powerful review features and integration points for continuous integration tools like Jenkins.

To learn how you can use Gerrit’s review feature with your new Git repo and hook up Jenkins with it, read our other blog post TeamForge Git/Gerrit with Jenkins CI. We would also recommend CollabNet GitEye, a free desktop client that combines a simple-to-use graphical Git client with central visibility into essential developer tasks such as defect tracking, agile planning, code reviews and build services.

Thank you !

Tagged with: , , , , , ,
Posted in Git, Subversion, TeamForge

Leave a Reply

Your email address will not be published. Required fields are marked *

*