The Legend of the Honey Badger

If you were participating in the beta test of the CloudForge application, you may have seen a honey badger.  Or two.  What you may not have known is that we saw all of them as well.  This is the story of that honey badger…

“No battle plan survives contact with the enemy.” — Helmuth von Moltke  (he was a German military strategist, by the way…)

Test-driven development (or “TDD”) is a great way to help guard against quality problems in your code, but it’s not a guarantee things will actually work the way you want.  Developing test cases is a bit of an art, and some people are naturally better at thinking about edge cases and details than others.

Let’s take a quick and simple example.  Let’s say you were testing a Ruby method:

def divide(a, b)
  a / b

So how would you test this?

Maybe you’d start with some very simple tests:

  assert_equal 2, divide(2, 1)
  assert_equal -2, divide(-2, 1)
  assert_equal 0, divide(0, 2)
  assert_equal 1.5, divide(4.5, 3)

These assertions would all pass.  You might think you’re in pretty good shape.  You’d deploy your code, and things would start to explode.  One reason might be easy to guess:  What if ‘b’ was set to zero?

On the other hand, a lot of the errors that might come out of this function may not be obvious — they may be decisions about design, rather than obvious errors:

  • What’s the expected outcome of divide(1,2)?  0? or 0.5?  This code will do simple integer division if both parameters are integers — the code would need to be changed to ensure float values were returned when necessary.
  • Is it correct that divide(-2,1) should be -2?  Or should it be 2?  Or should it throw an exception if either parameter is negative? Maybe the function wasn’t supposed to handle negatives and that’s indicative of some other error.
  • Is it correct that an exception is raised if b is 0?  Or should this function quietly return 0?  If the code is supposed to raise an exception in that case, we should test that it does — in case someone else decided to “fix it” by squelching the exception.
  • What happens if either of the parameters is nil?  Should it raise an exception?  Or should it coerce nil to 0?

Sometimes when developers are pushing hard to get things done, we can’t iterate on all these questions, we have to make some guesses.  Or maybe there’s just some combination of events or data that we just never expected — particularly in our case, where we’re dealing with several years of legacy data!  The important thing, then, is to track down the failures.

We are using various tools to make sure we’re made aware of failures in the code — whenever you saw one of those “honey badger” pages, several of our key developers would get an email with a complete trace-back of the code, the parameters that were sent to the server (passwords and such having been excised already), and that very often gave us everything we needed to go fix something.  (We started out with the exception_notification gem during initial roll-out and beta; we’ve added New Relic to help capture the error data as well).

Our practice is to reproduce the failure scenario as a test case.  This helps us be sure we’ve got the reproduction case well understood, and then when that test starts passing, we’ll know we fixed the code.  Plus, we get a further bit of ammunition for our test suite, regressions for edge cases going forward.  Sometimes, when we do figure out what the scenario that caused the error is, we have to go back to product management and figure out what the correct behavior is.

We call this TDD, too — Test-Driven Debugging!


Paul Clegg

Director of Engineering, CloudForge at CollabNet, Inc.

Tagged with: , , , , , ,
Posted in CloudForge

Leave a Reply

Your email address will not be published. Required fields are marked *