Story Point Estimates – Under-Estimating Large Items

While doing Scrum at Danube, we’ve long espoused the value of using relative, story point estimates over estimates based on strict chronology. We’ve written papers on why macro metrics are better than granular task based estimates due to the inherent uncertainty latent at the task level. And we eat our own dog food; the ScrumWorks team uses relative estimation units (labeled “headaches”, for fun) in estimating stories/backlog items.

Recently though, a new team member (we’ll call “Ed”) was baffled by the team’s estimation scale. Ed had trouble acclimatizing to our estimation scale, and at a recent retrospective he pointed out some major problems with the way our team seemed to be estimating backlog items.

The team had been using a scale that started with “1” for trivial changes, “2 through 6” for work manageable inside a single sprint, and an “8” for the biggest item we’d want to take on in a sprint without breaking down further. The team used “10”, “12”, “16” and so forth to estimate items that were typically too big for a single sprint and needed to be broken down. The team’s velocity was pretty consistently in the neighborhood of 40 headaches per sprint.

Ed noticed though that our scale seemed to be logarithmically correlated to the effort the team perceived would be necessary to complete the items. That is, a “4” was not four times the size of a “1”, and an “8” was not twice the size of a “4”. The team might be able to do 60 “1”s in a sprint, but only 10 “4”s. Likewise, the team could only handle three or four “8”s in a sprint.

This wasn’t an after the fact realization either. The team knew at the time of estimation that a “4” is way larger than four “1”s. A frequent statement during sprint planning was “Well, we could do eight more points of smaller stuff, but not a single item estimated at 8!”

Why does this matter? If the team produces a stable velocity each sprint, doesn’t that supply enough to forecast accurately? Actually, it doesn’t. The team was getting a stable velocity because we usually had a mixed-bag of estimate sizes each sprint: there would be some 1s, some 2s, some 4s, etc. However, our product backlog was mostly compromised of large items with bigger estimates (8s, 10s, 12s, etc.).

But this means that the 8s, 10s, and 12s on the backlog are actually much larger than the estimates reveal. So while there might be 100 points on the release backlog outstanding, because the estimates are in effect “low” our ability to forecast is thrown off course.

Once the truth of Ed’s observations sank in we immediately took action to create a new estimation scale that was completely linear. The Fibonacci sequence is a popular scale but we actually settled on powers of two. We didn’t want there to be much granularity at the low end of the sale, nor at the high end, and powers of two helped us achieve that goal.

Now the team is re-learning to estimate in a linear way. Yes, it takes some practice and getting used to, but I’m happy that our ability to forecast has been greatly improved!

CollabNet Team

CollabNet helps enterprises and government organizations develop and deliver high-quality software at speed. CollabNet is the winner of a 2016 Best of Interop Award, recognizing TeamForge for its innovation. Also recognized for 13 consecutive years as an SD Times 100 “Best in Show” winner in the ALM and Development Tools category, CollabNet offers innovative solutions, provides consulting and Agile training services, and proudly supports more than 10,000 customers with 6 million users in 100 countries. Our flagship product, TeamForge®, is the industry’s #1 open platform for enterprise software development, delivery, and collaboration. Leading companies and government agencies leverage TeamForge to accelerate application delivery with Agile, continuous integration (CI), continuous delivery (CD), and DevOps—and reduce costs through a governed adoption of open source tools, streamlined compliance, and the reuse of existing assets, resources, and processes in new projects.

Posted in Agile
3 comments on “Story Point Estimates – Under-Estimating Large Items
  1. When you said linear scale, I thought you meant the gaps between the allowable numbers, not the atomic unit of measure. This is a good post; I just wanted you to know that the term might confuse people due to its alternate use.

    You triggered me to put up a thorough post about what makes a good story point scale including this clarification.

  2. Victor Szalvay says:

    The title shouldn’t be “linear scale”, I’ll fix it up. Glad you found it useful.

  3. Michael James says:

    I think the underlying reason is that the technical risk increases with larger effort items, and humans are psychologically uncomfortable calling this out.

    It’s more comfortable to go from 8 to 13 to 21 (the Fibonacci scale) than from 8 to 16 to 32 (powers of two) even though the latter will turn out to be more accurate.


Leave a Reply

Your email address will not be published. Required fields are marked *