Gerrit Productivity Hack – Handling Large Binary Files with Gerrit, Artifactory and Git LFS

Handling large binary files with Git is a performance pain. You can work around the problem by some proper Gerrit tuning and restructuring your build scripts so that they fetch binaries from an artifact repository instead of having them part of the repo. With Git LFS, there is another approach available that does not require any changes to your build process or Git server configuration.

The Gerrit and JGit communities are still working on built-in Git LFS support, but I thought it makes sense to show how Gerrit can be used with a separate Git LFS backend – Artifactory – right now. I was first concerned that a separate Git LFS backend would require every end user to explicitly point to a different URL but as my example will show, that is fortunately not the case. So without further ado, let’s jump into the example.

Example – Versioning large Prezi files

My team loves to illustrate ideas we get while talking to all kind of folks using Prezi, an awesome presentation software. Using Prezi we can capture ideas very nicely, here is an example of a presentation on how one could motivate users to do more code reviews as developed in a workshop from PO DOJO in betahaus Berlin:

While Prezi is awesome, it does not have a versioning feature built in and its files are pretty big (the Prezi in question is 120 MB), so a perfect example for versioning with Git LFS. If you follow the next steps, you should be able to have your own Gerrit / Artifactory / LFS setup running in less than 20 minutes.

Step 1: Installing Git LFS extensions

Git LFS is not yet a built-in part of Git,  so you have to download the extensions from GitHub. All major versions of Linux, Windows and Mac are supported. Once you have finished and executed the download, all you have to do is to type git lfs install into your Git shell to complete the installation:

$ git lfs install
Git LFS initialized.

This is also the only step all users of your repository would have to do to use Git LFS.

Step 2: Cloning from Gerrit and tracking your large files

For our example, we assume that the Prezi files should be part of a repository called ShinyApp, so let’s clone this repository from Gerrit:

clone URL help from GitEye

$ git clone ssh://jnicolai@showcase.collab.net:29418/shinyapp && cd "shinyapp"
Cloning into 'shinyapp'...
remote: Counting objects: 2, done
remote: Finding sources: 100% (2/2)
remote: Total 2 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (2/2), 238 bytes | 0 bytes/s, done.

Let’s assume our Prezi is called incentivize_code_review.zip and has not been added to the index yet. Before we actually do that, we should tell Git LFS that this is one of the large binary files that should be treated specially:

$ git lfs track incentivize_code_review.zip
Tracking incentivize_code_review.zip

If you wanted to treat all zip files as large binaries, you could also type

git lfs track *.zip

(we don’t do this as part of this example)

It is important that you track binary files BEFORE you add them to the index as otherwise the staged file will still be stored in Git’s native database.

If you type git status, you will notice that .gitattributes has changed as well:

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
   .gitattributes
   incentivize_code_review.zip

It contains all files that are tracked by Git LFS and should be added to the index as well:

$ git add .gitattributes incentivize_code_review.zip
warning: LF will be replaced by CRLF in .gitattributes.
The file will have its original line endings in your working directory.

(You can safely ignore any warning about line ending replacements.)

Before we can craft the commit and push it to Gerrit, we have to tell it where the Git LFS backend is. This step has to be performed only once per repository.

Step 3: Setting up a Git LFS repository in artifactory

If you do not have an artifactory install yet, you can set up a free trial on JFrog’s web site within less than 3 minutes. Git LFS support for Artifactory is currently only available as part of the Pro and Cloud version.

Once you created an Artifactory admin account and logged into Artifactory, it should look like this:

Artifactory start page

Git LFS should show as available, otherwise you are probably using an older Artifactory version or not the Pro/Cloud version. Next, you would navigate to Admin -> Repositories -> Local from the left side bar and create a new local repository:

create new Git LFS repository

In the subsequent dialog, select Git LFS as package type and decide on a repository key for your repository. I used ShinyApp in this example:

repository details

Once you clicked Save & Finish, use the left side bar to navigate back to Main -> Artifacts, find your newly created repo and click on it:

Artifactory set me up

The last action we have to do in Artifactory is to click on the highlighted Set Me Up button so that the following dialog appears:

.lfsconfig details

Artifactory suggests to put the highlighted snippet into a file called .gitconfig inside your Git repository. Starting from LFS 1.1, this file naming convention is actually obsolete, so let’s put this snippet into the file .lfsconfig instead.

Step 4: Pointing Gerrit to Artifactory and pushing the commit

Let’s create .lfsconfig in our ShinyApp repository and paste the two suggested lines in:

git config -e -f .lfsconfig
<paste content and save>
Now, anybody who clones or pushes to the repository will know where files stored via Git LFS will live. We have to add .lfsconfig to the index and craft a commit:
$ git add .lfsconfig
$ git commit -m "Added info about LFS backend and first large binary file"

Let’s examine the commit:

$ git show HEAD
commit 185d2713b6a6146a36b6e192e3f1b7166de022f9
Author: Johannes Nicolai <jnicolai@collab.net>
Date:   Tue Jan 26 17:44:25 2016 +0100
   Added info about LFS backend and first large binary file
   Change-Id: I8bacfabe74ec75a14617288ba71f8023ae4f5e8d
diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..a8646ff
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1 @@
+incentivize_code_review.zip filter=lfs diff=lfs merge=lfs -text
diff --git a/.lfsconfig b/.lfsconfig
new file mode 100644
index 0000000..5507816
--- /dev/null
+++ b/.lfsconfig
@@ -0,0 +1,2 @@
+[lfs]
+url = "https://gerrit.artifactoryonline.com/gerrit/api/lfs/ShinyApp"
diff --git a/incentivize_code_review.zip b/incentivize_code_review.zip
new file mode 100644
index 0000000..e7668b3
--- /dev/null
+++ b/incentivize_code_review.zip
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:44a63a694240bd7834c9dd0d83c48a2989ec0117762a5e9bb11886a98398f419
+size 125839268

You can see in blue that .gitattributes contains the tracked file name, in green that .lfsconfig contains the information for Git LFS clients where to retrieve and store binaries and in red that the 120 MB zip file is actually not stored in the Git repository itself but just a pointer to it (sha256).

Finally, let’s push the commit to Gerrit:

$ git push origin HEAD:master
Username for 'https://gerrit.artifactoryonline.com': jonico
Password for 'https://jonico@gerrit.artifactoryonline.com':
Git LFS: (1 of 1 files) 120.01 MB / 120.01 MB
Counting objects: 6, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 664 bytes | 0 bytes/s, done.
Total 5 (delta 0), reused 0 (delta 0)
remote: Processing changes: refs: 1, done
To ssh://jnicolai@showcase.collab.net:29418/shinyapp
   35c8eee..185d271  master -> master

During the push, you will be asked for your Artifactory credentials. Git LFS works well with the Git credential helper if you do not like to enter your password all the time. If we refresh our repository in Artifactory, we can see that our Prezi just arrived there with the same sha256 as the one referenced in the Git commit:

uploaded prezi

Step 5: Cloning from a different host/user

If a team member now wants to access our Prezi, all they have to do is to install the Git LFS extensions as shown in step 1 and clone the repository (credentials required). Because of the .lfsconfig file being present, they do not have to know anything about the artifactory URL.

$ git clone https://potsdam@showcase.collab.net/gerrit/shinyapp && cd "shinyapp"
Cloning into 'shinyapp'...
remote: Counting objects: 7, done
remote: Finding sources: 100% (7/7)
remote: Total 7 (delta 0), reused 5 (delta 0)
Unpacking objects: 100% (7/7), done.
Downloading incentivize_code_review.zip (120.01 MB)
Username for 'https://gerrit.artifactoryonline.com': potsdam
Password for 'https://potsdam@gerrit.artifactoryonline.com':
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 100     9  100     9    0     0      9      0  0:00:01 --:--:--  0:00:01     9

If we look at the file size of our Prezi, it was automatically replaced by Git LFS with the one stored in Artifactory:

$ du -h incentivize_code_review.zip
121M    incentivize_code_review.zip

That concludes our mini example, hopefully you can modify it to your needs.

Advanced topics: Access right management and ssh support

The example above has not covered access right and user management within Artifactory. Artifactory has the ability to control read/write/administer access to its repositories on individual user and user group basis. Users can be synched with your corporate LDAP or SAML provider as well. Some companies using Gerrit we talked to also allow anonymous access to stored Git LFS content as long as the developer is within the company network. You might want to be careful with that option though if you are not hosting artifactory yourself and will be charged per GB transferred. The screenshot below shows how this feature can be turned on in Artifactory:

granting anonymous access

My example was using the https protocol while interacting with Artifactory. It is also possible to use ssh for that if you host your own server or have a dedicated server hosted at JFrog (IOW it is not supported in their Cloud version). More details on access right / user setup and the use of the ssh protocol can be found here.

Acknowledgements

Last but not least I like to thank Sebastian Schuberth (for the .lfsconfig hint), Ilmari Kontulainen and my colleagues at CollabNet for the nice discussions that helped shaping this blog post. As Ilmari pointed out, there are also other Git LFS backend available that could be used together with Gerrit, but Artifactory made a very good impression, both from its usability and functionality as well as from the responsiveness of their support folks (kudos to Mor from JFrog).

Johannes Nicolai

Johannes Nicolai is CollabNet’s Development Manager leading all Git and Gerrit related development efforts. Furthermore, he is responsible for CollabNet Connect /synch, CollabNet’s platform to integrate TeamForge with third party ALM platforms. Johannes holds a Master of Science in IT Systems Engineering from Hasso Plattner Institut Potsdam and is a Certified Scrum Master. Before joining CollabNet five years ago, he was doing consulting on user centric design, developing cryptographic software and architecting SAP integrations. He is an Open Source enthusiast and contributes to many projects (check out https://www.ohloh.net/accounts/10619 for details).

Tagged with: , , , , , , , , , , , ,
Posted in Enterprise Git, Git, Jenkins, SCM, TeamForge
6 comments on “Gerrit Productivity Hack – Handling Large Binary Files with Gerrit, Artifactory and Git LFS
  1. Great post!

    Would only add for those who struggle with installation of Git LFS Client extension under Windows, like me ;). I have git version 2.7.0.windows.1 installed and after extension install (into default location) I was not able to succeed with:
    git lfs install

    which was resulting with:
    git: ‘lfs’ is not a git command. See ‘git –help’.

    I have noticed that apart from uninstaller it provides also git-lfs.exe file. I was trying to install it to different locations (git-lfs.exe was not existing at that case at all). Finally resolved the issue by copying git-lfs.exe file to
    [git home]\mingw64\bin\

    After that was able to run successfully `git lfs install` and client works fine.

    Regards
    Jacek

  2. Starting Git LFS 1.2 there will be a major change into windows installer, and it should be more aligned to the windows installer of Git. I am assuming this would bring a better integration between Git and LFS extension.

  3. Starting Git LFS 1.2, the windows installer shall be aligned to the same installer as Git. I am assuming this should resolve such conflicts.

  4. roy Zanbel says:

    Thanks for bring the .lfsconfig to out attention.
    Issue was fixed and will be available in the next release of Artifactory.

  5. roy Zanbel says:

    Thanks for bring the .lfsconfig to our attention.
    Set me up was updated and will be available in the next release of Artifactory.

  6. Matthieu says:

    Hello,

    What plugin did you use to setup lfs on gerrit?

    I am only able to find the plugin for aws s3 or local storage.

    Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *

*