As we are getting closer to the end of the release process for Subversion 1.7, one of the things we have been trying to do is assess the performance characteristics of the release. This is a big release where we have essentially rewritten the working copy library. The aim of this rewrite was to make the code (which is the oldest in all of Subversion) more understandable with a better design. This would allow us to implement new features in future releases without the hindrance that the old library has caused in the past. However, the other goal was to greatly improve the performance of Subversion working copy operations, which should greatly improve the overall performance of Subversion and that is what this post is about.
Consolidating to a Single Administration Folder for Performance Gains
The biggest change in the working copy design is moving from the current system where every folder has a “.svn” folder inside it to a single “.svn” folder for the entire working copy with all of the administration data stored in a SQLite database. The original thought and hope was that once we had the code modified so that there was only a single administration folder we would see immediate performance gains across the product. We could then perform additional SQL optimizations in Subversion 1.7.x and 1.8 releases that would really blow the old performance out of the water. The reality is that it did not pan out that way and we actually saw significant performance decreases when the work of consolidating the data into a single location was complete. The good news is that as we dive into each area of the code and start optimizing the use of the SQL database we are seeing the overall performance improvements we were hoping for. The problem is that this is making the release take longer than anticipated because we have to do a lot more restructuring and optimization than we originally planned for the initial 1.7.0 release?
When is the Performance ‘Good Enough’ to Release?
One of the frequent topics of conversation among the developers is about when the performance is good enough that we can release? We cannot answer that question without some controlled benchmarks that measure the performance of the previous releases, and the current code, so that we can compare them. These benchmarks can also help the developers focus on the areas with the biggest problems. Most developers have their own adhoc tests they have been running to look for performance problems, but we need something more formalized in order to make decisions.
Being a Java programmer, I do not personally get to contribute a lot to Subversion core development as it is all written in C. I mainly just build and test changes to provide feedback and contribute to the Java bindings as well as test them from Subclipse. Adding a set of performance benchmarks seemed like an area where I could help the Subversion project. I have started an open-source project to do this and you can access what I have done so far on the openCollabNet web site. I am using the project’s wiki to document what I have been doing, how the tests are run, what they do and also a place to record results.
The sort of tests I have written so far are mainly useful for doing comparisons on a single computer. In other words, it does not really matter how my results and your results compare to each other, what matters is how my results compare with each other (and how your results compare with each other) as I run them using different builds of Subversion.
How Can You Help?
Thanks for asking 🙂 Taking the time to download and run the tests, as well as reporting the results would be a help. Even if you are just establishing a baseline of how older releases perform on your system it would be helpful. As an aside, I think it would be great if someone took the time to run these tests over a series of Subversion releases to see if there were any releases where performance noticeably changed for better or worse. Obviously if you can build the Subversion trunk source code and test it too, that would be spectacular as that is the ultimate goal. I realize that is not always possible, so if I see interest in this project I can work to make binaries available for download from this site. Of course, as we get closer to the final release we will be doing this anyway. If you have already established and recorded baselines for different releases you will be able to just grab those binaries and run the tests again to confirm the results on your system.
Review and feedback on the tests I have written would be helpful. I have written some documentation in the wiki on what the tests are doing. Perhaps someone reading this has worked on benchmarks for their own software and can share some ideas and knowledge. There are some areas I would like to do better:
- I currently only report the elapsed time for a command. I could not find a cross platform way to track things like the amount of CPU time used by the command. I suspect we will have to instrument Subversion itself to really get this information.
- Subversion performance is heavily impacted by the operating system’s disk cache. I do nothing in these tests to try to manipulate that cache.
- I think the code for the tests is easy enough to understand that it would be relatively easy to write new tests. That said, if I see interest I will spend more time on the documentation and code comments.
Contributing ideas for new tests would be helpful. These tests are not designed to verify behavior, Subversion already has tons of tests for that. So we are mainly looking for ideas that might be relevant for working copy performance. For example, maybe there are certain commands you use a lot that are not covered and that you have ideas or concerns about. Maybe you think there is something unique about the structure of your application that is not covered. As an example, folders with thousands of files or projects with thousands of folders can reveal different performance problems. I plan to try to test for these scenarios, but maybe you have more ideas I have not thought of. The project has a Tracker and Discussion forum you can use. Take your pick!
Just testing in your own environment would help a lot as it will give us coverage on more operating systems and file systems. Do you normally have your working copy stored on a network mount? Test that. We need to know where there are serious regressions and also where we have made serious improvements.
Finally, the current tests are focused on commands that impact the working copy. I am not trying to measure the performance of the network or server, so there are no tests for things like the log command. That said, the tests do issue a number of commands that talk to the server and they are designed so that you could run them against a server so you could use them to compare the performance of different server access methods or just different Subversion server releases. That is not my personal focus at the moment, but there is no reason these tests could not expand in that direction.
Update: I have posted the development build I used for Windows testing. Feel free to download it and test the current state of 1.7 on your own system.
photo credit: felicemcc