Are Detailed Log Messages Really Necessary?

The Subversion open source project has earned a reputation for paying close attention to its processes, and doing its best to find that "sweet spot" where as much information as possible is harvested from its contributors without frustrating them with overbearing rules and regulations. One area where this plays out is in the composition of log messages attached to commits and would-be commits (patches). The project has some pretty picky guidelines about the content and format of its log messages. These guidelines (which can be found at http://subversion.tigris.org/hacking.html#log-messages) are among the lengthiest bits of Subversion process documentation to be found. They require, among other things, documentation of the change made at the granularity of a single function (or symbol).

Why? Why not just use a short one-liner log message that points folks to an issue tracker artifact or an archived email thread where the full history of the background of the change is more likely to be found? Is this just an old habit from the Subversion community’s CVS days that didn’t die, or are there benefits?

These questions were asked of me recently, and I try here to answer them.

I think it’s safe to say that most folks would prefer to not jump around in various tools to find out what they need to know. Why should a developer have to launch a web browser and connect to an issue tracker, page through potentially hundreds of comments — some of which are valuable, and some of which are just folks saying, "Yeah, I want this bug fixed, too" — all just to figure out how a particular function has changed over time? Why should a user have to dig around in version control history trying to discern whether or not a particular bug has been fixed or feature added, and trying to derive from log message comments and branch names the likely release in which that fix or feature will see the light of day?

They shouldn’t. You see, the two tools have totally different purposes. They are complementary, to be sure. But the intended audience is generally quite different.

Version control systems are all about the code. Developers live in version control. Writing software without version control is like… well, I’ve heard it’s pretty bad. But users are often blissfully unaware that version control even exists, happy to float from release package to release package without a moment’s consideration that the differences between them are the product of many individual little changes stored in a version control system somewhere. The issue tracker is where the users live.

An issue tracker follows the lifespan of a defect or feature at a very high level. You see, users don’t care that function doit() in file srcfile.c needed a tweak on line 823 which corrected the error handling logic around a call to the doit_helper() private function. They just want to know that their bug is gone, and in which version of the software the fix will be released. Developers, on the other hand, want to know — need to know — all those gritty details. What functions or classes were changed in a revision? In what other revisions was that function or class changed, and in what way? Unfortunately, to keep this kind of information in an issue tracking tool is suboptimal.

I and several other developers I know keep ChangeLog files in the tops of our working copies, generated from the output of svn log -v. (Some projects even version their ChangeLog files, though that is almost certainly a CVS throwback.) Why do I have these files? Because I’m constantly trying to answer questions like, "In which revisions was srcfile.c changed? Why? Which of those revisions touched doit(), and again, why?" If I had to aggregate that information from both the version control and issue tracker tools — especially if the tracker tool didn’t offer command-line-scriptable access with both human- and machine-parsable output — I and many developers like me would be crippled.

However, for that ChangeLog to be really useful, I need to be able to find changes keyed on function name or class. Unfortunately, most version control tools don’t really "understand" the contents of the files they house. Sure, you can see changes to individual lines of text if you ask for a diff between two versions of a file. But the version control tool itself doesn’t know the difference between one function and another. And that’s why you have to record information at that level of detail in a way that the tool will report to you, such as in the revision log message. Naturally, if a log message has information at that level of detail, you want to visually arrange it so that changes are grouped by path, and be consistent in your formatting so that you can script the parsing and reporting of that information. So it is these very common developer needs and widely practiced activities that have driven the Subversion community leaders to demand well-formed log messages as described in the project’s process documentation.

Of course, there are also some legacy reasons for some of the finer details of various log message policies. Some folks use particular text formatting for identifying issue tracker artifact IDs or contributor information. This is, again, done largely to aid machine-parsability, cross-tool integration, and reporting. Subversion’s support for arbitrary revision properties should serve as motivation to store information like that in custom properties (such as myproject:issue-ids or myproject:contributors), which would greatly simplify the post-processing of that data. And I believe that TortoiseSVN actually facilitates this behavior to some degree. I suspect two things need to happen before this becomes common practice, though. First, Subversion has to allow you to set arbitrary revision properties as you perform the commit. (Fortunately, this feature should be released in Subversion 1.5. How did I know that? Because the right tool for the job told me so.) Secondly, Subversion needs to support better client-side interaction with — and searching of — those custom properties.

Here is the script I run occasionally to generate those ChangeLog files. It notes the basename of the Subversion repository URL associated with your current working directory. If you have a working copy of the trunk, it generates the ChangeLog file with log data for the trunk and all branches. Otherwise, it generates the ChangeLog file with log data for just the versioned directory you are in. It only works if your repositories follow the recommended practice of having trunk, tags, and branches directories in the root of the repository, but that is common for the projects I work on. Obviously, there’s nothing magical about this script. I could just wield svn log manually when I needed to find something. But I don’t always work where a network connection exists, so having this local cache of the revision log information is extremely handy. Anyway, here’s the script:

#!/bin/sh

if [ ! -d ".svn" ] ; then
  echo "ERROR: Not in a Subversion working copy" 1>&2
  exit 1
fi

LOGFILE=./ChangeLog
TRUNK=`svn info | grep "URL" | tail -c 6`
if [ ${TRUNK} = "trunk" ] ; then
  URL=`svn info | grep "Repository Root" | cut -c18-`
  echo "Generating ChangeLog for ${URL} (trunk, branches)"
  svn log -v ${URL} trunk branches > ${LOGFILE}
else
  URL=`svn info | grep "URL" | cut -c6-`
  echo "Generating ChangeLog for ${URL}"
  svn log -v .@HEAD > ${LOGFILE}
fi
C. Michael Pilato

C. Michael Pilato is a core Subversion developer, co-author of Version Control With Subversion (O'Reilly Media), and the primary maintainer of ViewVC. He works remotely from his home state of North Carolina as a software engineer for CollabNet, and has been an active open source developer since early 2001. Mike is a proud husband and father who loves traveling, soccer, spending quality time with his family, and any combination of those things. He also enjoys composing and performing music, and harbors not-so-secret fantasies of rock stardom. Mike has a degree in computer science and mathematics from the University of North Carolina at Charlotte.

Tagged with: , , , , , , , ,
Posted in Subversion
2 comments on “Are Detailed Log Messages Really Necessary?
  1. dave says:

    Excellent! For some reason I have the hardest time convincing people that detailed log messages are actually useful. This article will really help to make the argument.
    Although, personally, rather that put function-level info in the log messages, I usually stick with svn blame. But to each their own.

  2. C. Michael Pilato says:

    ‘svn blame’ is a powerful tool, and I use it alot myself. Unfortunately, using blame to follow the history of changes to a function over many revisions can be really painful, often causing you to use multiple invocations of ‘svn blame’ with successively older windows of revision history. Nothing beats (for me, anyway) an Emacs incremental search on the function’s name in my ChangeLog file.

Leave a Reply

Your email address will not be published. Required fields are marked *

*