Well, it’s coming up on election day and the news is wall-to-wall polls, so it’s time for the biennial lecture on polling errors.
Seven polls have come out in October. Kean was ahead only in one, and that one was released ten days ago. If you toss out the Zogby poll which has Menendez up by ten points (I’m having less and less confidence in Zogby’s numbers), Menendez’s average lead is only like 3 or 4 points. The statisticians will note that that’s probably not a statistically significant margin. But when all the polls are coming up with the same narrow margin, I think you can say that Menendez is now back on top.
No, the statisticians probably won’t note any such thing. The sampling error falls as the square root of the sample size. A poll with 2000 respondents has half the sampling error of a poll with 500 respondents.
Four polls with 500 respondents each can be treated as a single poll with 2000 respondents. So if we assume that the six polls have sample sizes that produce sampling errors of between plus/minus 3 and plus/minus 4, which is the usual range, then the sampling error for the collection of six polls would be roughly plus/minus 3.5 divided by the square root of 6, which is plus/minus 1.4. A lead of three to four points is therefore (barely) outside the range of sampling error. (If you had the actual polls in front of you, you’d want to weight each one by its sample size and figure out a weighted average lead, and then figure a sampling error from the pooled sample sizes.)
That’s good news if you want Menendez to win.
But it’s only modified good news.
First, there’s nothing magic about the estimated error band. Some journalists write as if any margin outside the band is certain, and any margin inside it is meaningless (the phrase you see is “virtually tied”). That’s just wrong. What the error band means is that if there had been a second poll taken at the same time — same questions, same sampling frame, same interviewers — nineteen times out of twenty the results of the second poll would differ from the results of the first by less than the plus/minus amount.
So if we’re told that X is ahead of Y in the latest poll by 50%-45%, plus or minus 3%, that means that if twenty more polls had been done in the same way at the same time, in nineteen cases the results would have been between 53-42 and 47-48 the other way. So being up 4 points in a survey with a sampling error of plus/minus 3 isn’t a lead outside the confidence interval: your lead needs to be twice the size of the confidence interval to give you that 95% assurance.
On the other hand, a lead within the confidence interval is still, statistically, a lead: you’d rather be up 2 plus/minus 3 than down 2 plus/minus 3. If anyone wanted to bother, you could compute the actual probability that the measured lead is entirely due to sampling error: that’s the p-value so beloved of the social science journals. But I’ve never seen it done for polling results.
Second, and more important, sampling error is only one kind of error, and rarely the largest. Systematic, or non-sampling, errors are the sources of inaccuracy that won’t go away if you do the same poll twice. The sampling frame may not accurately reflect the population that will actually vote. The non-response group — people selected to be sampled but who couldn’t be reached or who wouldn’t answer — may be tilted toward one candidate or the other. The questions or the interviewers may impart some bias. (If you ask about the Iraq war before you ask about who the respondent tends to vote for this year, that’s going to give the Democrat a couple of extra points.) The voter might misreport his intentions or change his mind. And of course the votes as counted may differ from the votes as cast, by an unknowable margin.
So a polling lead “outside the margin of error” doesn’t mean that your guy is actually 95% likely to win. That’s why polls tend to use small samples. Given the non-sampling errors, there’s not much point in increasing sample size much past 1000 if all you want to do is guess the horserace. The great advantage of larger samples isn’t increased precision about the end result: it’s that the subgroups — suburban women with children, for example — get big enough so you can start having confidence in the story they’re telling.