While doing Scrum at Danube, we’ve long espoused the value of using relative, story point estimates over estimates based on strict chronology. We’ve written papers on why macro metrics are better than granular task based estimates due to the inherent uncertainty latent at the task level. And we eat our own dog food; the ScrumWorks team uses relative estimation units (labeled “headaches”, for fun) in estimating stories/backlog items.
Recently though, a new team member (we’ll call “Ed”) was baffled by the team’s estimation scale. Ed had trouble acclimatizing to our estimation scale, and at a recent retrospective he pointed out some major problems with the way our team seemed to be estimating backlog items.
The team had been using a scale that started with “1″ for trivial changes, “2 through 6″ for work manageable inside a single sprint, and an “8″ for the biggest item we’d want to take on in a sprint without breaking down further. The team used “10″, “12″, “16″ and so forth to estimate items that were typically too big for a single sprint and needed to be broken down. The team’s velocity was pretty consistently in the neighborhood of 40 headaches per sprint.
Ed noticed though that our scale seemed to be logarithmically correlated to the effort the team perceived would be necessary to complete the items. That is, a “4″ was not four times the size of a “1″, and an “8″ was not twice the size of a “4″. The team might be able to do 60 “1″s in a sprint, but only 10 “4″s. Likewise, the team could only handle three or four “8″s in a sprint.
This wasn’t an after the fact realization either. The team knew at the time of estimation that a “4″ is way larger than four “1″s. A frequent statement during sprint planning was “Well, we could do eight more points of smaller stuff, but not a single item estimated at 8!”
Why does this matter? If the team produces a stable velocity each sprint, doesn’t that supply enough to forecast accurately? Actually, it doesn’t. The team was getting a stable velocity because we usually had a mixed-bag of estimate sizes each sprint: there would be some 1s, some 2s, some 4s, etc. However, our product backlog was mostly compromised of large items with bigger estimates (8s, 10s, 12s, etc.).
But this means that the 8s, 10s, and 12s on the backlog are actually much larger than the estimates reveal. So while there might be 100 points on the release backlog outstanding, because the estimates are in effect “low” our ability to forecast is thrown off course.
Once the truth of Ed’s observations sank in we immediately took action to create a new estimation scale that was completely linear. The Fibonacci sequence is a popular scale but we actually settled on powers of two. We didn’t want there to be much granularity at the low end of the sale, nor at the high end, and powers of two helped us achieve that goal.
Now the team is re-learning to estimate in a linear way. Yes, it takes some practice and getting used to, but I’m happy that our ability to forecast has been greatly improved!