Libertarian Candidate Gary Johnson’s website boasts that the third-party candidate is “polling nationally from 2.4% to 9% and various states have him polling up to 15%”. Like polls of the support of countless minor political candidates in the past, these numbers are almost certainly wrong, for an intriguing statistical reason.
Imagine a poll about a candidate named Smith who represents a major party and has, in truth, 40% support in the population. Imagine further that the poll is accurate 90% of the time. The other 10% of the time (due to leading questions, pollster error, voter confusion etc.) the poll predicts that someone who in fact will vote for Smith will not do so or that someone who will in fact vote for someone else will vote for Smith.
To keep the example simple, assume that the poll is only concerned with whether people will vote for Smith or not, where the non-Smith category includes voting for candidate Jones, Green, Wilson or not voting at all. Again for simplicity, assume the poll is of 100 voters, so the number of voters equals the percentage of predicted support for Smith. The table below shows what the poll will conclude.
The poll predicts that Smith will garner 42% of the vote (i.e., the poll will count 42 of the 100 voters it surveys as Smith supporters). These 42 votes are counted in the poll correctly in 36 cases and incorrectly in 6 cases (10% of the voters who really aren’t going to vote for Smith got counted as supporting Smith). The 42% result is wrong but it’s not bad at all as an estimate, whether you compare raw numbers (40% vs. 42% support) or compare the size of the error to the base rate of support (2% is only 5% of Smith’s true support of 40%).
The estimate is in the right ballpark because Smith’s level of support is near 50%. Indeed if his support were in fact 45% the poll would be even more accurate, despite its 10% error rate. In contrast, imagine that Smith’s true level of support is far from the midpoint, for example 10%. Here are the same poll results with same number of respondents and the same error rate.
The poll predicts 18% support for Smith. This is highly inaccurate whether one looks at the raw number of points (8 percentage points different than reality versus only 2 points when Smith was at 40% support) or compares the predicted versus the actual support (The predicted support of 18% is almost double the true support of 10%). Why did the same poll that proved fairly accurate for Smith at 40% prove so misleading when he was at 10%?
What is going on here is less complex than it may seem. It is simply harder to predict events that are unlikely than events which are likely. If a fair coin is being flipped over and over and you have to guess on which particular flip it will come up heads, you’ve got a 50-50 shot of winning the game. But if the same game is played with an unbalanced coin that comes up heads only 1% of the time, you will almost certainly not guess the right flip, even if you are allowed to play many times. Indeed, any system you might use to predict when the elusive heads result will occur will be less accurate over time than simply predicting that the coin will never come up heads no matter how many times it is flipped (that prediction is correct 99% of the time…this same phenomenon underlies why so many seemingly sage predictions about how politicians with certain characteristics can’t be elected president are in fact vapid).
The problem gets worse the further a candidate departs from 50% support. If Smith were actually at 1% support, the result of the 90% accurate poll would put Smith at a completely misleading 11%. More accurate polling helps but does not solve this problem: A 95% accurate poll of a candidate who actually has 1% support will still typically overstate his or her support by a factor of 5.
Finally, one might assume that averaging the results of different polls can surmount the challenges of estimating support for minor political candidates. That does indeed improve accuracy if you have a third-party candidate such as Theodore Roosevelt with a fairly high level of support, but not when a candidate is at the typical American third-party level of 1% or 2% support. A poll that overstates that candidate’s support can multiply it many fold as shown above, but a poll that understates it can’t go lower than 0%. The average of all error-ridden polls will thus still tend to overstate the candidate’s level of support because the errors have more room to grow in the upside than in the downside direction.