Friday, November 2, 2012

Predictions are Hard (Especially About the Future)

While famous malapropist Yogi Berra is most often cited for the quote, “Prediction is very difficult, especially about the future,” it appears that the source was actually Danish physicist and Nobel Prize laureate Niels Bohr. Bohr, whose pioneering work in quantum physics would naturally equip him with a keen sense of the limits of knowledge, also had a sense of humor. (He also said, “An expert is a man who has made all the mistakes that can be made in a very narrow field.”)

Bias and Accuracy

In my long study of cognitive biases on this blog and in my compilation Random Jottings 6: The Cognitive Biases Issue, I was struck again and again by how many of the biases had to do with perceptions of probability. From ambiguity aversion to the base rate fallacy to the twin problems of the gambler’s fallacy and the ludic fallacy, we have repeatedly shown ourselves to be incapable of judging probabilities with any degree of precision or understanding. When people rate their own decisions as "95% certain," research shows they're wrong approximately 40% of the time.

With the 2012 presidential election only four days away as I write this, the issue of prediction and forecasting is uppermost in the minds of every partisan and pundit. Who will win, and by how much? Checking the polls as I write, the RealClearPolitics average gives President Obama a 0.1% lead over Governor Romney (47.4% to 47.3%). Rasmussen has Romney up by 2 (49% to 47%), Gallup by 5 (51% to 46%), and NPR by 1 (48% to 47%). On the other hand, ABC/Wash Post and CBS/NY Times both have Obama leading by 1 (49% - 48% for ABC, 48% - 47% for CBS), and the National Journal has Obama up by 5 (50% - 45%). No matter what your politics, you can find polls to encourage you and polls to discourage you about the fate of your preferred candidate.

Some polls normally come with qualifications. Rasmussen traditionally leans Republican; PPP often skews Democratic. That doesn't means either poll is irrelevant or useless. Accuracy and bias are two different things. Bias is the degree to which a poll or sample leans in a certain direction. If a study comparing Rasmussen or PPP polls to the actual election results shows that Rasmussen's results tend to be 2% more toward the Republican candidate (or vice versa for PPP), both polls are quite useful — you just have to adjust for the historical bias. If on the other hand a poll overestimates the Democratic vote by 10% in one election and then overestimates the Republican vote by 10% in another election, there's no consistent bias, but the poll's accuracy is quite low. In other words, a biased poll can be a lot more valuable than an inaccurate one.

Selection Bias

Of course, political polls (or polls of any sort) are subject to all sorts of error. My cognitive biases entry on selection bias summarizes common concerns. For instance, there’s a growing argument that land-line telephone polls, once the gold standard of scientific opinion surveys, are becoming less reliable. Cell phone users are more common and skew toward a different demographic. There's also a sense that people are over-polled. More and more people are refusing to participate, meaning that the actual sample becomes to some extent self-selected: a random sample of people who like to take polls. People who don’t like to take polls are underrepresented in the results, and there’s no guarantee that class feels the same as the class answering. (I myself usually hang up on pollsters, and I've often thought it might help our political process if we agreed to lie to pollsters at every opportunity.)

Selection bias can happen in any scientific study requiring a statistical sample that is representative of some larger population: if the selection is flawed, and if other statistical analysis does not correct for the skew, the conclusions are not reliable.

There are several types of selection bias:

  • Sampling bias. Systemic error resulting from a non-random population sample. Examples include self-selection, pre-screening, and discounting test subjects that don’t finish.
  • Time interval bias. Error resulting from a flawed selection of the time interval. Examples include starting on an unusually low year and ending on an unusually high one, terminating a trial early when its results support your desired conclusion or favoring larger or shorter intervals in measuring change.
  • Exposure bias. Error resulting from amplifying trends. When one disease predisposes someone for a second disease, the treatment for the first disease can appear correlated with the appearance of the second disease. An effective but not perfect treatment given to people at high risk of getting a particular disease could potentially result in the appearance of the treatment causing the disease, since the high-risk population would naturally include a higher number of people who got the treatment and the disease.
  • Data bias. Rejection of “bad” data on arbitrary grounds, ignoring or discounting outliers, partitioning data with knowledge of the partitions, then analyzing them with tests designed for blindly chosen ones.
  • Studies bias. Earlier, we looked at publication bias, the tendency to publish studies with positive results and ignore ones with negative results. If you put together a meta-analysis without correcting for publication bias, you’ve got a studies bias. Or you can perform repeated experiments and report only the favorable results, classifying the others as calibration tests or preliminary studies.
  • Attrition bias. A selection bias resulting from people dropping out of a study over time. If you study the effectiveness of a weight loss program only by measuring outcomes for people who complete the whole program, it’ll often look very effective indeed — but it ignores the potentially vast number of people who tried and gave up.

Unskewing the Polls

In general, you can’t overcome a selection biases with statistical analysis of existing data alone. Informal workarounds examine correlations between background variables and a treatment indicator, but what’s missing is the correlation between unobserved determinants of the outcome and unobserved determinants of selection into the sample that create the bias. What you don’t see doesn’t have to be identical to what you do see. That doesn't stop people from trying, however.

With that in mind, the website, developed by Dean Chambers, a Virginia Republican, attempts to correct what he sees as a systematic bias as to the proportion of Republicans and Democrats in the electorate. By adjusting poll results that in Chambers’ view are oversampling Democrats, he concludes (as of today) that Romney leads Obama nationally by 52% - 47%, a five point lead, and that Romney also leads in enough swing states that Chambers projects a Romney landslide in the electoral college of 359 to 179, with 270 needed for victory.

Chambers argues that other pollsters and analysts who show an edge for Obama are living in a “fantasy world.” In particular, he trains his disgust on Nate Silver, who writes the blog FiveThirtyEight on the New York Times website, describing him as “… a man of very small stature, a thin and effeminate man with a soft-sounding voice that sounds almost exactly like the ‘Mr. New Castrati’ voice used by Rush Limbaugh on his program. In fact, Silver could easily be the poster child for the New Castrati in both image and sound. Nate Silver, like most liberal and leftist celebrities and favorites, might be of average intelligence but is surely not the genius he's made out to be. His political analyses are average at best and his projections, at least this year, are extremely biased in favor of the Democrats.” (You may notice a little bit of ad hominem here. Clearly a short person with an effeminate voice can’t be trusted.)

A quick review of the types of selection bias above will identify several problems with the unskewed poll method. Indeed, it's hard to find anyone not wedded to the extreme right who's willing to endorse Chambers' methodology. The approach is bad statistics, and would be equally bad if done on behalf of the Democratic candidate.

Nate Silver and FiveThirtyEight

Other views of Nate Silver are a bit more positive. Silver first came to prominence as a baseball analyst, developing the PECOTA system for forecasting performance and career development of Major League Baseball players, then won some $400,000 using his statistical insights to play online poker. Starting in 2007, he turned his analytical approach to the upcoming 2008 election, and predicted the winner of 49 out of 50 states. This resulted in his being named one of the world’s 100 most influential people by Time magazine, and his blog was picked up by the New York Times. (He's also got a new book out, The Signal and the Noise: Why So Many Predictions Fail — But Some Don't. I recommend it.)

As of today, Nate Silver’s predictions on FiveThirtyEight differ dramatically from the UnSkewedPolls average. Silver predicts that Obama will take the national popular vote 50.5% to 48.4%, and the electoral college by 303 to 235. One big difference between Dean Chambers and Nate Silver is that Chambers is certain, and Silver is not. He currently gives Obama an 80.9% chance of winning, which means that Silver gives Romney a 19.1% chance of victory using the same data.

This 80% - 20% split is known to statisticians as a confidence interval, a measure of the reliability of an estimate. In other words, Silver knows that the future is best described as a range of probabilities. Neither he, nor Chambers, nor you, nor I “know” the outcome of the election that will take place next Tuesday, and we will not “know” until the votes have been counted and certified (and any legal challenges resolved).

Predictions vs. Knowledge

In other words, when we predict, we do not know.

Keeping the distinction straight is vital for anyone whose job includes the need to forecast what will happen. Lawyers don’t “know” the outcome of a case until the jury or judge renders a verdict and the appeals have all been resolved. Risk managers don’t “know” whether a given risk will occur until we’re past the point at which it could possibly happen. Actuaries don’t “know” how many car accidents will take place next year until next year is over and the accidents have been counted. But lawyers, risk managers, actuaries — and pollsters — all predict nonetheless.

A statistical prediction, by its very nature, contains uncertainty and should therefore be expressed in terms of the degree of confidence that the forecaster has determined. “The sun’ll come out tomorrow,” sings Annie in the eponymous musical, and she’s almost certainly right. But that’s a prediction, not a fact. While the chance of the Sun going nova are vanishingly small, they aren’t exactly zero.

Confidence Level and Margin of Error

Poll results usually report both a confidence level and a range of error, such as “95% confidence with an error of ±3%.” The error rate is the uncertainty of the measurement itself. If we flip a coin 100 times, the theoretical probability is 50 heads and 50 tails, but if it came out 53 heads and 47 tails (or vice versa), no one would be surprised. That’s equivalent to an error of ±3%. In other words, a small wobble in the final number should come as a shock to no one.

The confidence level, on the other hand, is the degree of confidence you have that your final number will stay within the error range. The probability that an honest coin flipped 100 times would produce 70 heads and 30 tails is low, but it’s within the realm of possibility. In other words, the “95% confidence” measurement tells us that 95% of the time, the actual result should be within the margin of error — but that 5% of the time, it will fall outside the range. (There’s a bit of math that goes into measuring this, but it's outside the scope of this piece.)

Winning at Monte Carlo

Nate Silver’s 80% confidence number comes from using a modeling technique known as a Monte Carlo simulation, which is also used in project management as a modern and superior alternative to the old PERT calculation, a weighted average of optimistic, pessimistic, and most likely outcomes. In a Monte Carlo simulation, a computer model runs a problem over and over again in thousands of iterations, choosing random numbers from within the specified ranges, and then calculates the result. If the polls are right 95% of the time within a ±3% margin of error, the program chooses a random number within the error range 95% of the time, and 5% of the time chooses a number outside the range, representing the probability that the polls could be all wet. In running five or ten thousand simulations, the results gave the victory to Obama 80.9% of the time, and to Romney 19.1% of the time.

Tomorrow, the answer may be different. Silver will enter new data, and the computer will run five or ten thousand more simulations. Each day, the probability of winning or losing will change slightly, until the final results are in and the answer is no longer a matter of probability but a matter of fact.

The Thrill of Victory and the Agony of Defeat

Astute readers may notice the parallels here to Schrödinger's Cat, which is mathematically both alive and dead until the box is opened. Personally, I put a lot of credence into Silver’s analysis; his approach is in line with my understanding of statistics. That means I think Obama is very likely to win next Tuesday — but only within a range of probability.

I will also note that Nate Silver seems to feel the same way. He's just been chided by the public editor of the New York Times for making a $2,000 bet with "Morning Joe" Scarborough that Obama will win. Given his estimate of an 80% - 20% chance of an Obama victory, that sounds like a pretty good bet to me.

But we won't know until Tuesday night at the earliest. So be sure to vote.

No comments:

Post a Comment