Apropos of my Q&A on early polling, Chris Wlezien sends the following, which is forthcoming in a book with Robert Erikson. They take all of the polls from up to 300 days before the presidential elections from 1956-2008 (except for 1968, which did not have polling that far back). They then forecast each presidential election outcome with the polls, starting 300 days before the election and continuing day-by-day until Election Eve. The figure below plots the trend in the r-squared from the forecasting equation. If polls were perfect predictors of the outcome, the r-squared would be 1.0. If the polls were pretty much useless predictors, the r-squared would be 0.The graph shows that polls 300 days have little predictive value at all. The r-squared values increase sharply during the next 3 months or so—when the eventual nominees are becoming better known during the primaries—and then increase more sharply again in the 3 months before the election, when the general election campaign is underway.

I interpret this as a useful cautionary tale about over-interpreting early polls.

I’m fascinated by the dramatic dips in that graph. Why the sharp drops at ~100 days out, and then a smaller one at ~60 days out?

Are the dips due to convention bounces? So, there’s a convention, driving up a candidate’s poll numbers. And polls from this period do a worse job predicting?

That would be my guess too. Lately conventions have been < 100 days out, but for most of their data I believe they were right around that 110-120 day mark where there is a dip.

How about the early dip in the 230-240 days out range? Is that related to challengers getting a bump when they wrap-up the nomination during the later primaries?

Forgive the ignorant nature of this question, but can someone clarify what it means for a poll to have no predictive value? Does this mean that the polls show someone leading who turned out not to win? Or does it mean that the polls are so volatile and/or variable as to not systematically favor anyone in particular? Or … ?

The explanation of the conventions at ~100 days out occurred to me; are most conventions closer to ~60 days out now, and that would explain the other dip?

The other thing I find interesting is that there are several instances of “runs”, where the values seem to display a strong degree of autocorrelation (~250 and ~160 days out are the most noticeable ones). I’d be curious if there’s something different about the polls that were incorporated for those values.

I’m inclined to think that the blip at 100 days is noise (i.e. the curve is overfitted), but I don’t know enough statistics to know how to go about estimating how sure we can be from these data that the blip is not noise.

Also, @jme: I would attribute the second phenomenon you mention to the sentence “Daily polls were interpolated where missing.” in the figure caption.

Thanks, I missed the interpolation bit, that makes sense.

@Joel: The R-squared is the % of the variance in one varable explained by the other. In other words, if you had a collection of election results (ie, all the elections), and a collection of all the polls taken around, oh, 150 days out, those polls would help explain 50% of the variation in the final results. So, the outcome in this case is expressed as % voting Dem in the general election, and it varies. In 2008, for example, it was 52.87%, and in 2004, it was 48.27% and so on. That number varies. If you know the poll results 150 days out, you can “explain” half of the variation. The other half is, essentially, random noise.

Really, for me, a more interesting statistic is the standard error of the estimate, which would tell you how far off your predictions are, generally. But, John’s absolutely right: if the R-squared for polls 300 days out is essentially 0, and the graph clearly indicates that it only got worse as we got further out, then polls now are, generally speaking, worthless for predicting election results. However, if you tell me what the poll results are going to be on November 1, 2012, I’ll tell you who’s going to be president with a great deal of accuracy.

Point taken, but keep in mind what the explanatory variable is here: the mean % Democratic in generic polls. This doesn’t mean that “polls” in general are uninformative (which would, quite honestly, be a shocking finding), but just that political party support (or leanings) don’t predict presidential outcomes very well. That’s interesting, but not all that surprising. Now, if we had data – at least for incumbent presidents – on support for the president or support for specific candidates (or even generic ones), I imagine that the R-squared would be a good deal higher, even 300 days out (albeit likely still far from 1).

Why isn’t this obvious? 300 days from the elections you don’t know who the Republican or Democratic candidates are. In 2008 you’d have a bunch of people saying “Hillary Clinton” and “Mike Huckabee” but then as the primaries sorted out the fields you had people switching their choices to “Barak Obama” or “John McCain.” Of course as the number of choices and days until the election decreased the polls became more predictive.

Without understanding the methodology (does this include 300 days of McCain vs Obama polls and 300 days of Bush vs Kerry polls?) all you’re doing is affirming the fact that it’s harder to predict the future the farther out you are. I suspect you’d get the same type of graph if you looked at who people thought would win the World Series as the baseball season progressed.

Yeah, I’ve looked at something similar. There are “runs” in daily poll results in part because different polling firms have different polling intervals and each firm has a different house effect.

In some sense, a more interesting question is whether one type of poll can tell you something about a different election. For example, how well do presidential approval polls predict change in congressional composition at the mid-terms (i.e., to what extent are mid-terms a referendum on the President?) You can see that here, up to the 2006 mid-terms:

http://sowhatyouresayingis.blogspot.com/search/label/polls

Predictive? Of course not. Early polls track trend over time.

When you go to a ball game, do you show up for the last quarter or inning?