Estimating Software Development
I’m a big believer in agile software development. One major part of that is being able to determine a team’s development velocity (a.k.a. how fast the team works) in a period of time. The naive approach is to use straight up time estimates. Task A will take me 4 hours, task B will take me 2 days, task C will take me 5 minutes, etc. Decades of late projects has served as empirical evidence that this does not work. Any software worth building is usually complex enough that there is ALWAYS something you don’t expect. Maybe some complexity is discovered during development. Maybe the code review sets off a long discussion. Maybe after showing the completed feature to the rest of the team, everyone realizes the product needs to change. It is hard to put a time estimate on the unknown. Some people will argue “Just pad your estimates!” However, how do you know how much padding to add to something that is unknown? If you did, then there wouldn’t actually be any unknowns.
Agile software development introduced a concept called story points to resolve these issues. The basic premise is that these are abstract and relative units of measurement. If one feature is given a baseline of 8 points, then an easier task is 3 or 5 points and a harder one is 13. Story points do not map back to time. The practice requires that a team does not assign 1 point to 1 hour or anything like that. It sounds strange because in the end we use story points to determine how much work a team can get done in a period of time. However, disassociating story points from time is critical to making them work. It allows teams to think more abstractly and better account for the unknowns. When a task a team has done in the past is given a story point value, they can now compare all future work against past work. Feature A is arbitrarily given 5 points and Feature B is definitely more complex, so we give it 8 story points. If we know a team has completed 30 story points worth of work last week, then we know they can handle 6 features like Feature A every week.
“Wait a minute! Can’t you just divide a week’s worth of hours by 30 and get time per story point?“We could… but then we would just be using time estimates again and encounter all the issues we have with time estimates. The point of an abstract unit of measurement like story points is to give a team freedom to make a mistake when estimating a task. A 5 point task in the past had all sorts of unknowns creep up that required more time to resolve than expected. A new task looks like it COULD have a similar amount of unknowns, so we also give it 5 points. If we have two 5-point tasks, one task could be completed quicker than expected because nothing bad pops up, while everything that could possibly go wrong with the second task does. In the end, the time taken by the two tasks are wildly different, yet they averaged out to what our abstract estimate was. This averaging makes accurate predictions over the course of 1-4 weeks possible even though the team’s estimates on an individual task are most likely inaccurate.
The theory is sound, but it comes with a fatal flaw: thinking in the abstract is really hard. Everyone thinks differently and have different perspectives. Abstract units of measurement change with that perspective. Teams with disagreements about story point values are supposed to talk through their reasons and come to an agreement. Unfortunately, the winners of debates tend to be the most persuasive and persuasiveness does not necessarily correlate with good estimation. The only team I’ve been on that has applied story points well was one where we were together for over a year. That was how long it took for us to get used to each other’s thought process. Try telling your boss “I need a year with my team before I can give you any predictability around our work.” You will have as much fun as I did.
A number of very late projects made me think that something else had to be tried. I started to think more about time estimates. The nice thing about time estimates is that it gives a programmer something concrete to measure: “It will take me 2 hours to write this algorithm.” I wanted to keep that concrete measurement. The thing that had to give was the expectation that every member on the team could complete 40 hours of estimated work a week. We ran one 2-week period (a.k.a a development sprint) with 120 hours of estimated work for 200 available hours of work for the team members. Sounds like a good amount of padding? Wrong. We completed 60 hours of work. While initially discouraging, this had several benefits.
The first is that it prevented us from being overly optimistic the next time. The next sprint was to complete the remaining 60 hours of work. One team member responded with “that doesn’t sound like a lot, we can do more!” History proved that we couldn’t so we shouldn’t expect to. As sad as it sounded, the number helped ground the team to reality.
That leads to the second benefit: it gave us a baseline for getting better. If we could do 60 hours of estimated work in one sprint, then we should aim to do 66 hours the next sprint. When we hit 66 hours, we should aim for 72. Rinse and repeat.
This way of time estimation also maintains the averaging benefits of story points. One 4-hour task may end up taking 6 hours, while another 4-hour task may take 3 hours. In fact, my team really is treating 1 hour as a “story point”, but instead of pretending to think abstractly, we are up front about using a concrete method of estimating.
The thing I like most is that it lets programmers continue to estimate in a way that feels natural while still being accurate. All debates around estimates are based on tangible pieces of work that programmers can clearly visualize in their minds, not abstract concepts. There is also no hand wavey magic (e.g. “try padding all your estimates by 2-3 times what you put down”) to try and make a time estimate fit into a two week period. We make the two week period fit into our time estimates.