On Pollster Ratings

by John Sides on June 20, 2010 · 3 comments

in Public opinion

538’s new pollster ratings are getting noticed, of course. Mark Blumenthal of Pollster weighed in on Friday with this lengthy post. Nate Silver replies here.

One of Silver’s complaints is that Blumenthal asks “how useful” are pollster rankings, but, in Silver’s view, doesn’t really answer the question. Silver sees them as indisputably useful, particularly because the reputation of pollsters has largely been based on impressionistic criteria.

So how useful are they? Here is my view:

Yes, we should have pollster ratings. Why? Because data are better than impressions. Are Silver’s ratings the gold standard? I don’t think there is such a thing, nor would Silver think so either, I suspect. What there are (or could be) are different attempts at ratings with different sorts of assumptions. Blumenthal critiques some of Silver’s assumptions, as well he should. Silver would defend them, I gather, as well he should. We need to have this debate. If other people came up with pollster ratings and were transparent about their data and assumptions, that’s fine—although I don’t think it’s a high priority (see below).

What do pollster ratings tell us about the “skill” of pollsters? Let’s assume for the moment that other ratings would generate findings similar to 538’s. If that were the case, then pollster ratings tell us that most pollsters aren’t really that different in terms of their “skill.” (I am leaving aside the deeper question of what “skill” means and whether it is actually captured by such ratings.) Blumenthal points this out. Scanning 538’s rankings, I agree. What does it mean to me that, with one exception (Zogby Interactive), the “best” and “worst” pollsters are about 1.8 points apart in their “pollster-induced error” (as calculated based on pre-election polls predicting election outcomes)? It doesn’t mean much. Rare is the election in which pollsters with a 1-point pollster-induced error are going to tell me something much different than pollsters with a 2-point error.

I can also speak to “are pollster ratings useful?” from this perspective: when I teach public opinion courses to undergraduate students, will I talk about these ratings, whether 538’s or anyone else’s? No. This is why I don’t think pollster ratings are a high priority. It’s because of the vast terrain that pollsters cover but within which we cannot rate pollsters even if we wanted—namely, polling about things other than trial heat election match-ups. This terrain includes other aspects of political candidates and leaders and especially political issues. The best pollsters want to do more than just tell us whether Candidate X will beat Candidate Y, and by how much. But we’ll never be able to rate them on their work in other areas because there is no event—no election—that will determine whether a pollster’s findings with regard to opinion on abortion reflect “skill.”

Moreover, there are many things that affect the responses that pollsters get: sampling strategy, mode of interview, question wording, question order, etc. On average, variations in these things will elicit far more variation in response than does the “skill” of any pollster. I’m not saying anything new here—certainly not anything that Silver and Blumenthal don’t know (they spend a great deal of time writing on these issues, and deserve credit for it) or that even any reasonably thoughtful layperson doesn’t know. But the enduring importance of these factors is why I will focus far more on them in my classes than on pollster ratings per se.

A final note. When I am cranky about polls, it often has nothing to do with “skill,” sampling, mode, question wording, etc. It has to do with interpretation. Pollsters sometimes send out press releases hyperventilating about changes that are well within the margin of error. They report on only their polls, even with other polls don’t show the same trends. Etc. Pollsters do a lot more violence to our understanding of public opinion with their slipshod and self-serving interpretations than they do with any “error” they induce. I suppose a set of pollster ratings could take this into account, however subjective such judgments may be.

But until then, I will be glad that 538’s ratings exist, even as I continue to evaluate polls primarily on dimensions that those ratings cannot capture.

UPDATE: David Shor tries his hand at pollster ratings and also finds little difference among pollsters. There are house effects, but little evidence of pollster-induced error.

{ 3 comments }

Nazgul35 June 20, 2010 at 10:26 pm

I’d just be happy if they reinserted the four questions designed to test the respondent’s knowledge of the issue they are responding to.

But that made the media unhappy, so…

David Shor June 21, 2010 at 3:39 am

I’ve actually described a generalization of these methods to non-horse-race time series like Obama Presidential approval on my site.

Most of the effects you’re talking about (Question order, Sampling Bias, etc), would show up in such a model as a house effect.

Taking house effects into account, it’s actually relatively trivial to adjust for the effect of a consistently biased pollster.

Simon Jackman June 22, 2010 at 8:18 pm

I’ve got a reasonably straightforward model that does this too; in the Bayes world the model is a DLM (dynamic linear model).

The 1st write-up was in an article in the Australian Journal of Political Science. See also Ch9 of my book.

An extension to variables like presidential approval appears in a 2006 working paper with Neal Beck and Howard Rosenthal.

Jeff Lewis and I developed something similar in 2008, with a hierarchical model over a set of battleground states and the national polls, with offsets for house effects etc.

Comments on this entry are closed.

Previous post:

Next post: