soto eyloo: The Mathematics of Correcting Estimations

I used to work for a guy who was an amazing project manager. He was an army officer and a logistician. He did logistics. He never rose beyond the rank of captain, yet he understood basic principles of logistics. His favourite saying was that an army has to eat, crap and move. Therefore the three most important logistics elements of any army project and maneuvers were food, toilet paper and gasoline. Everything else was just an extra detail. When it came to estimating, his idea of doing so was to raise his thumb like a rifle sight and pretend to shoot randomly at the wall. When a target eventually appeared, he would measure the difference from his first random shot to where the target appeared, or the delta and adjust the rest of his estimations and assumptions with the same relative degree of error.

This got me to thinking in the vein of the Black Swan, by Nassim Nicholas Taleb, who's philosophy is that we really know nothing about the future, therefore we cannot predict much, until we know something. And the past does not always indicate the future. So how can we proceed in this world based on projections that are subject to the whims of randomness.

In my twenty year consulting career, and including various businesses that I have been involved in, I have made many predictions. These appeared in business plans, pro forma projections, project plans, and other documents that my clients paid me to produce. The strange thing was that there was never a major penalty for being wrong, even though I was supposed to be the expert. The most important deliverables were the projections or predictions themselves, and not how accurate or reliable they were.

Many things hinder a prediction from coming true, but is there a way to track how accurate any prediction is? Is there a rule of thumb, or a gross discounting equation? Is there a way to normalize someone's educated guesses to more accurately reflect reality? Is there a tool to do this? These damn black swans keep getting in the way of making accurate predictions. Is there a possibility of Mandelbrotian mathematics coming to the aid of assessing predictions -- especially numeric predictions?

The Gaussian method would be to go back and calculate averages and standard deviations and then try to ascertain a degree of certainty or a coefficient of correlation? Gaussian methods do not work where there is a possibility of two directions or vectors of truth and there is no direct correlation between the two except for the existential solution to the function. The easiest example of this is that suppose you have a function whose answer lies either in a Venn set of A or B and the only thing tying the two sets together in any way, is the result of the function could be a member of either set.

This is like the problem of estimating, predicting or putting down pro forma numbers. Either you over-estimate and the ultimate reality is less than you estimate, or you under-estimate and the ultimate real figures are way above what you predicted. This can roughly correlate to A for the over estimation and B for the under estimation.

Needless to say, that the answer or reality lying either in A or B doesn't translate well into Gaussian math -- especially when there is no relationship between A and B such as a reciprocal or multiplication by -1 or any other visible mathematical relationship. This is were the mathematics of the situation becomes what Taleb calls Mandelbrotian, and I call Ulamian. (Click on the Stanislaw Ulam label at the bottom of this post for an explanation).

So you have a sort of a dipole representation of the outcome of the normalizing function to assess predictions. Remember, what I am looking for is an equation or series of equations that takes someone's most educated, erudite and most thought out predictions or guesses, and process them mathematically to get some sort of realistic assessment of someone's predictions.

I am a data pack rat, and I went back to many many predictions that I have made in the past in my consulting career. Needless to say, none of my predictions or estimates was bang on. It was my nature and perhaps universal nature to over or under estimate in almost every case.

When it came to revenues, I would grossly over estimate. When it came to units sold, units produced, or units delivered, I would grossly over estimate. I was over confident in the amount of time that things took, and I over-estimated our production capacity to get things done. If I estimated that we could write 500 lines of production code a day, we would actually write something closer to the one hundred mark.

Another amazing factor was that the consequences of the estimates or predictions were overwhelmingly on the negative or unpleasant side. If I were estimating possible revenues, the estimate would be on the "bad news" side. I came to the conclusion that it was just like the universe -- entropy was at work. Things are much easier to destroy than to build. Bad happens more often than good. In short in was the Peter Principle all over again -- "If something can go wrong -- it will!"

So was there a rule of thumb to correction estimations? I came up with the dipole concept which is shown in the graph below:

There are two curves connected by a straight line at the zero x axis. The A Venn set is the left most curve. The B Venn set is the rightmost curve. There exists another incarnation of this graph that is an isomer if you will, where the curves are mirror image reversed in y axis magnitude. Lets go with the variant pictured above.

I went back and did some analysis of all of my predictions. They fell into two camps -- one where I over-estimated things and they did not come to pass, and those where I under-estimated things and they came to be in a much greater numerical quantity than I expected. I found an interesting thing.

When there was a problem or negative outcome with under-estimating (such as under-estimating possible revenues in a cash-starved start-up company), there was an 80 percent probability that it would be the reality. In only 20 percent of the cases was the surprise a pleasant one. And the obverse is true as well. If I underestimated and the actual figure was much more, it was usually time to completion or other numbers that had negative or nasty implications.

I fired up my mathematics and came up with the Cosmological Cabbage Prediction Normalization Formula, or the CCPNF. It is a dipole type of thing where you get two ranges of answers. You get a set of A answers, where the real value will appear somewhere in the set if the originator overestimates, and you will get a set of B answers if the originator underestimates. The probability of A or B is 80 percent of which ever outcome will cause the most harm or the most negative consequences.

So let's take the hypothetical case that you are a venture capitalist, and someone says that they will sell 100,000 widgets. You suspect that this is an gross over-estimation. So you take your handy Cosmological Cabbage Prediction Normalization Formula or CCPNF and do the following calculations:

x = the predicted value

A(upper) = 0.056x + (.341 *(0.056x))

A(lower) = 0.056x - (.341 *(0.056x))

The upper limit of A is 7,896 and the lower limit is 3,304. The reality has an 80 percent chance of being between those two figures.

Now, lets suppose that they underestimated. According to my formula, the real figure will lie between the lower and upper values of B.

x = the predicted value

B(upper) = 1.68x + (.341 *(1.68x))

B(lower) = 1.68x - (.341 *(1.68x))

In this case, B(upper) is 225,288 and 110,710. But if this will be a pleasant thing, then there is only a 20 per cent chance of this happening.

How is this useful? Let's go back to the case of the hypothetical venture capitalist going over a business plan. If the business plan projects that they will sell 100,000 widgets, then the formulae says that if things go wrong, they will in fact sell between 3,300 and 7,800 widgets instead of the predicted 100,000. That is roughly how much people over-estimate by. And if there numbers are sound, and their plan is sound, and they have under-estimated, then they should sell between 110,000 and 225,000 widgets.

So there you have it -- a rough rule of thumb to discount numerical predictions. I look forward to real world validations of the CCPNF.

soto eyloo

Tuesday, December 15, 2009

The Mathematics of Correcting Estimations

No comments:

Post a Comment

Blog Archive