One of Silver’s complaints is that Blumenthal asks “how useful” are pollster rankings, but, in Silver’s view, doesn’t really answer the question. Silver sees them as indisputably useful, particularly because the reputation of pollsters has largely been based on impressionistic criteria.
So how useful are they? Here is my view:
Yes, we should have pollster ratings. Why? Because data are better than impressions. Are Silver’s ratings the gold standard? I don’t think there is such a thing, nor would Silver think so either, I suspect. What there are (or could be) are different attempts at ratings with different sorts of assumptions. Blumenthal critiques some of Silver’s assumptions, as well he should. Silver would defend them, I gather, as well he should. We need to have this debate. If other people came up with pollster ratings and were transparent about their data and assumptions, that’s fine—although I don’t think it’s a high priority (see below).
What do pollster ratings tell us about the “skill” of pollsters? Let’s assume for the moment that other ratings would generate findings similar to 538’s. If that were the case, then pollster ratings tell us that most pollsters aren’t really that different in terms of their “skill.” (I am leaving aside the deeper question of what “skill” means and whether it is actually captured by such ratings.) Blumenthal points this out. Scanning 538’s rankings, I agree. What does it mean to me that, with one exception (Zogby Interactive), the “best” and “worst” pollsters are about 1.8 points apart in their “pollster-induced error” (as calculated based on pre-election polls predicting election outcomes)? It doesn’t mean much. Rare is the election in which pollsters with a 1-point pollster-induced error are going to tell me something much different than pollsters with a 2-point error.
I can also speak to “are pollster ratings useful?” from this perspective: when I teach public opinion courses to undergraduate students, will I talk about these ratings, whether 538’s or anyone else’s? No. This is why I don’t think pollster ratings are a high priority. It’s because of the vast terrain that pollsters cover but within which we cannot rate pollsters even if we wanted—namely, polling about things other than trial heat election match-ups. This terrain includes other aspects of political candidates and leaders and especially political issues. The best pollsters want to do more than just tell us whether Candidate X will beat Candidate Y, and by how much. But we’ll never be able to rate them on their work in other areas because there is no event—no election—that will determine whether a pollster’s findings with regard to opinion on abortion reflect “skill.”
Moreover, there are many things that affect the responses that pollsters get: sampling strategy, mode of interview, question wording, question order, etc. On average, variations in these things will elicit far more variation in response than does the “skill” of any pollster. I’m not saying anything new here—certainly not anything that Silver and Blumenthal don’t know (they spend a great deal of time writing on these issues, and deserve credit for it) or that even any reasonably thoughtful layperson doesn’t know. But the enduring importance of these factors is why I will focus far more on them in my classes than on pollster ratings per se.
A final note. When I am cranky about polls, it often has nothing to do with “skill,” sampling, mode, question wording, etc. It has to do with interpretation. Pollsters sometimes send out press releases hyperventilating about changes that are well within the margin of error. They report on only their polls, even with other polls don’t show the same trends. Etc. Pollsters do a lot more violence to our understanding of public opinion with their slipshod and self-serving interpretations than they do with any “error” they induce. I suppose a set of pollster ratings could take this into account, however subjective such judgments may be.
But until then, I will be glad that 538’s ratings exist, even as I continue to evaluate polls primarily on dimensions that those ratings cannot capture.
UPDATE: David Shor tries his hand at pollster ratings and also finds little difference among pollsters. There are house effects, but little evidence of pollster-induced error.