Git Repository Replication with Gerrit and TeamForge

Update: Git Replication is now a built-in feature in TeamForge 8.1

Why do we need replication in general?

There are several scenarios in which it is quite handy to have a repository whose version history is exactly the same as the one it mirrors. Perhaps the most obvious one is the maintenance of a simple backup repository, used when the primary repository has become inaccessible due to a hardware failure, network outage, or other annoyances. Other scenarios include deploying mirror repositories to distribute heavy load across multiple servers.

Git Repository Replication with Gerrit and TeamForge

We’re using Git, a DVCS. We don’t need replication. Do we?

Depending on Git workflow model your organization is using, replication makes lot of sense. In enterprise environments, centralized workflows are quite common. 

Git Repository Replication with Gerrit and TeamForge

Having distributed version control system(DVCS) like Git gives you the certainty that each repository is the complete copy of the upstream repository or to be precise a copy of shared repository including branches, tags etc.. Only caveat is each of the repository copies developers are keeping, has to be synchronized with the shared repository. It often happens that developers are working in a detached fashion and only occasionally syncing with the shared repository. It is highly possible that at point in time, all repositories are out-of-sync. Shared repository is the only one which is deemed as governing repository in a centralized workflow. If the shared repository gets compromised or corrupted, then there are less chances  that full recovery is possible using developers’ copies. Even if it does then as well enterprise standards and compliances wouldn’t appreciate such practice of recovering from developers copy and making it a production copy. In this situation having a reliable and accessible last state of the shared repository makes recovery way easier. Having the shared repository always replicated to another location aptly addresses two items from the checklist mentioned at beginning of this post: backup and reliable fail-over.

Git Repository Replication with Gerrit and TeamForge

In order to understand how replication helps to address the third item from the list, load-balancing, we’ll have to take a peek at Git’s internals. In order to retrieve the latest copy from a shared repository developers have to perform a gitfetch operation. It’s a very computing and memory intensive operation (about one parallel fetch per server CPU). Moreover network bandwidth and latency matters, especially since Git fetches all missing revisions, not just the latest. Growing user bases on a given Git server and hence their fetch requests, can generate tremendous amount of load on the Git server. In order to reduce some load from the heavily loaded main Git server, its replicated counterpart can shoulder responsibility to serve all/some amount of fetch operations. Few/all developers can choose to fetch from replicated shared repository but continue performing push operations (sending update to server) on shared repository allows ultimately smoother end user experience.

How is it possible to achieve Git replication using TeamForge?

CollabNet TeamForge ships with a Git Integration which is based on Gerrit. Apart from being an efficient code review system for Git repositories, Gerrit is a full featured standalone Git Server. Gerrit has been designed to serve large number of Git repositories and is being used in communities like Google Code or the Android Open Source project [checkout here]. Gerrit provides a possibility to replicate Git repositories hosted on Gerrit to Git repositories on hosted another system (not necessarily Gerrit). All Gerrit versions supported by CollabNet (2.1.10 and 2.6.1) support Gerrit’s replication feature. In Gerrit 2.1.10, it is part of Gerrit core, in Gerrit 2.6.1, it is a separate plugin you can download from here.

If you are on more recent versions of Git Integration v8.2.x (based on Gerrit 2.8.x), download compatible version replication plugin from here.

Likewise if you’re on Git Integration v8.3.x (based on Gerrit 2.9.x), compatible plugin binary is here.

How to configure Gerrit to replicate a repository?

Open Source Gerrit project documentation describes how to configuration replication for Git repositories. However using TeamForge Git Integration makes it easier to host Gerrit server along with other benefits.  Assuming you use TeamForge with our Git Integration, let’s look at the following use case:

“I have TeamForge site(tf.box) with 2 different Git integration servers already setup. Our first Git Integration server, let’s call it sourceGerrit.box has a Git repository for which I want to setup replication on a different Git Integration server hosted in a different geography. Let’s call it targetGerrit.box“.

Git Repository Replication with Gerrit and TeamForge

Our Objective: On sourceGerrit.box there is a repository named reprepo. We would like to set up replication for this repository to another Git Integration server targetGerrit.box integrated with the same TeamForge (tf.box)

Git Repository Replication with Gerrit and TeamForge

Pre-Requisites:

On tf.box : We need the credentials of a TeamForge user  which will be used to push Git objects to the mirrored repositories in targetGerrit.box. In this exercise, we’re referring to this user as tfReplicationUser. Furthermore, we need the credentials of a  TeamForge user’s credential who is part of Gerrit Administrators group. In this exercise, we’re referring it as sourceGerritAdmin. You could use the same account for both if you want to.

On sourceGerrit.box: You need root access to sourceGerrit.box because we need to modify a file owned by the Gerrit unix user

Once you have satisfied all pre-requisites, let’s follow the steps below.

On tf.box

  1. Go to TeamForge webUI on tf.box,choose the Git Integration adapter hosted on targetGerrit.box and create a repository with exactly the same name reprepo in a Teamforge project.

    Git Repository Replication with Gerrit and TeamForge

  2. Make sure that on tf.box user tfReplicationUser has TeamForge role with delete/view or scm admin source code permission in a TeamForge project.

On sourceGerrit.box

  1. Run the following command on sourceGerrit.box on terminal to add targetGerrit.box host into sourceGerrit.box ‘s gerrit system unix user’s ~/.ssh/know_hosts file.
    su gerrit                       
    ssh -p 29418 targetGerrit.box                       
  2. If you are using Gerrit 2.6.1 then download the replication plugin from here ,If you are on later versions of Git Integration v8.2.0 or later, download compatible version replication plugin from here. Copy downloaded jar file into /opt/collabnet/gerrit/plugins directory of sourceGerrit.box and restart the Gerrit service. Ensure owner of the file is Unix user gerrit. In case of Gerrit 2.1.10, replication is part of Gerrit core and you do not have to download anything.
  3. To make sure replication plugin is installed (Gerrit 2.6.1 only), login using sourceGerritAdmin user’s credential to Gerrit WebUI  and Go to Plugins tab and make sure replication plugin is listed there.                                                                                                                                                                                         Git Repository Replication with Gerrit and TeamForge
  4. Create file /opt/collabnet/gerrit/etc/replication.config with the content below and save the file. This tells Gerrit to push all repositories where the Gerrit group ToBeReplicatedToTargetGerritBox has READ access. It will push all branches and tags but no special refs (like refds/changes) as pushing change sets from one Gerrit to another creates problems. Target repositories are assumed to have the same name but one could change that behavior by providing a different URL. One could even specify a command to automatically create the target repository. To learn more about the various options in this file, have a look here.

      [remote "targetGerrit.box"]
      url = ssh://tfReplicationUser@targetGerrit.box:29418/${name}.git
      authGroup = ToBeReplicatedToTargetGerritBox
      push = +refs/heads/*:refs/heads/*
      push = +refs/tags/*:refs/tags/* 

  5. Go to Gerrit’s Web UI and login using sourceGerritAdmin user’s credential.
  6. Go to People tab and press Create New Group and give it a name i.e ToBeReplicatedToTargetGerritBox and save these changes.

    Git Repository Replication with Gerrit and TeamForge

  7.  Add tfReplicationUser user into this group

     Git Repository Replication with Gerrit and TeamForge

  8. Go to Projects tab select “reprepo” and then select tab Access and go on editing access right Read for refs/* for group ToBeReplicatedToTargetGerritBox and save these changes. Git Repository Replication with Gerrit and TeamForge

    By above you can enable replication for repository reprepo.

  9. Get shell access to host of sourceGerrit.box. Associate Gerrit Unix user’s public key of sourceGerrit.box with tfReplicationUser. For that run following command and copy output to clipboard.
    cat ~gerrit/.ssh/id_rsa.pub 
  10. Login as tfReplicationUser to TeamForge WebUI on tf.box and go to My Settings, append key from clipboard into Authorization keys section and save.
  11. On host sourceGerrit.box restart the gerrit service if you are using Gerrit 2.1.0 or reload the replication plugin if you run Gerrit 2.6.1 to pick up new configuration by running command on terminal:
    su gerrit
    ssh -p 29418 sourceGerritAdmin@localhost gerrit plugin reload replication

    This step is only necessary if you’ve changed the replication.config file 

  12. Start replication
    su gerrit
    ssh –p 29418 sourceGerritAdmin@localhost replication start --all

    This step only needed if you like to trigger replication immediately. Otherwise, Gerrit will not replicate until a push to a repository to be mirrored is done. You can also trigger replication for a specific repository if you do not use the --all switch but refer to its name. 

Once above is done, reprepo on targetGerrit.box will be populated with current state of reprepo from sourceGerrit.box and from thereon any creation/deletion/update of commit, branch, tags will be also replicated to targetGerrit.box seamlessly.

Do I have to repeat all these steps if I want to enable replication for another Git repository?

In order to enable replication for further Git repositories, you don’t have to repeat the whole excercise. You only have to follow steps 1 & 2  on tf.box and step 5 and 8 on sourceGerrit.box. Step 12 is optional if replication should start immediately. In case you want to replicate all repositories on sourceGerrit.box then you’ll have to add Read access for refs/* on All-Projects for group ToBeReplicatedToTargetGerritBox as per step 8.Also make sure all repositories exist in the tf.box created using Git Integration adapter hosted on targetGerrit.box.

Feedback wanted

We realize that having 12 steps to setup replication is not exactly easy. Please tell us if you encounter any problems during those steps or require further explanations. Also drop a note if you like to share any ideas how we could make the replication feature usable in an easier fashion.

Dharmesh Sheta

Dharmesh is a Senior Integration Engineer at CollabNet and works with team for Gerrit/Git product solutions in Germany. He has 8+ years experience in Continuous Integration, SW build & release processes, test automation, SW development tooling and infrastructure. Prior to CollabNet, he spent ~4 years at Nokia and was part of core team responsible for CI processes and infrastructure for global SW dev. teams. Dharmesh holds M.Sc in SW Engineering from Leuphana University of Lüneburg, Germany and graduated from VNSGU, India.

Tagged with: , , , , , ,
Posted in Git, TeamForge
6 comments on “Git Repository Replication with Gerrit and TeamForge
  1. Vivek Singh says:

    Where can I download the plugin (for gerrit 2.6.1) from? I do not see a link in your post.

  2. Henry says:

    Thanks for the post! I have a couple of questions:

    1. Will the replication always override the target repository? What if a conflict happens during the replication? How do you resolve the conflict between the two repositories?
    2. Can we define a replication from the target repository to the source? How will this work?

    Thanks,
    Henry

Leave a Reply

Your email address will not be published. Required fields are marked *

*