Dean had the support of 24 percent and Clark had the backing of 20 percent in the CNN-USA Today-Gallup poll out today. The poll of 465 Democrats and those who lean Democratic had a margin of sampling error of plus or minus 5 percentage points, meaning Dean and Clark are essentially tied for the lead nationally.
No, dammit, no! [Pounds the lectern in sheer frustration.] Being 4 points behind with a 5- point margin of error isn’t being “essentially tied.” It’s being 4 points behind, plus or minus 5 points. That’s a lot better than being 21 points behind, plus or minus 5 points, but I’d rather be ahead, thanks.
Of course, the reported margin of error reflects sampling error only, and ignores all the sources of systematic error. All it means is that, if I’d called another 465 people at the same time, using the same algorithm to select them, using the same weighting formula to adjust the sample to he assumed population of actual voters, and having the same interviewers ask the same questions, there’s a 95% chance that the results of the second sample would have been within 5 points of the results of the first sample.
But the interviewers might not be perfectly impartial, and the questions, the sampling algorithm, and the weighting formulas might all embody imperfect models of actual voting behavior. The extent of those “systematic” errors cannot be estimated by simply taking the reciprocal of the square root of the sample size. So the reported margin of error is an underestimate of the actual uncertainties involved.
But there’s no such thing as a “statistical tie,” and it’s better to be ahead than behind.
[Yes, this will be on the exam.]
Update
A reader points out a confusion in the above.
A sampling error of +/— 5 points means there is a 95% chance that the sampled proportion is within 5 points of the true population proportion. But if you sample another 465 people, as you suggest, and compare the results, then you’re dealing with a different animal: the difference between two binomial population proportions.
After running through the calculation, he concludes:
Therefore, there is only an 83.5 percent chance that the “results of the second sample would have been within 5 points of the results of the first sample.”
That calculation allows me to rise to a challenge from another reader, who writes:
“The poll of 465 Democrats and those who lean Democratic had a margin of sampling error of plus or minus 5 percentage points, meaning Dean and Clark are essentially tied for the lead nationally.”
You just might be reincarnated as an Associated Press writer. Please rewrite the above in a way that meets the following criteria.
1. It cannot be much longer; it has to be suitable for use, with minor variations, almost daily.
2. It must be clear enough to give a conscientious reader with, say, a ninth grade education a reasonable chance of understanding of what you are saying.
3. It must be accurate enough not to get you damned by a social sciences professor with statistical bent.
It will be on the exam.
Which is a fair enough challenge. Here’s my proposed rewrite:
The actual size of Dean’s lead over Clark is uncertain due to sampling error. There’s about a one-in-six chance that a different sample would have shown Clark actually tied, or even ahead.
To be fair, the one-in-six number is one that the reporter couldn’t have found without doing a calculation, but if reporters insisted the polling firms would supply the error bands in the appropriate format.
The paragraph as rewritten doesn’t deal with non-sampling error, and I don’t know how to do so without using more words.
Blog experts and amateurs
UCLA professor Mark A. R. Kleiman uses some pointed blogger critiques of New York Times columnist Paul Krugman and Chicago professor Daniel Drezner to make interesting points about the clash of expertise and amateur observation on the Web. Separately, …