From my online NYT column:

For months now people I know have been stopping me on the street to ask who I think will win the election. And for months I’ve always said, either side could win it. . . . Different models use different economic and political variables to predict the vote, but these predictions pretty much average to 50-50. . . .

Last week I was interviewed by a reporter from France, who asked me who I thought would win the election. I said, it’s too close to call. He said, but Nate Silver’s FiveThirtyEight blog on NYTimes.com currently gives Obama a 72.9 percent chance. I think Nate is great, and I was thrilled to have the opportunity to contribute to his blog for a while. But I’d still say the election is too close to call. . . .

The online betting service Intrade gives Obama a 62 percent chance of winning, and I respect this number too, as it reflects the opinions of people who are willing to put money on the line. The difference between 63 percent and 75 percent may sound like a lot, but it corresponds to something like a difference of half a percentage point in Obama’s forecast vote share. Put differently, a change in 0.5 percent in the forecast of Obama’s vote share corresponds to a change in a bit more than 10 percent in his probability of winning. Either way, the uncertainty is larger than the best guess at the vote margin.

Where is this uncertainty coming from? First, public opinion can shift quickly during the last week of campaigning, as news arrives and as voters make their final decisions. Second, the polls aren’t perfect. Nonresponse rates continue to rise and, although pollsters are working on more and more sophisticated methods for correcting for this problem — an area of my research — we cannot always catch up. Even an average of polls can be wrong, but it’s hard to know in advance of the election how wrong they will be, or in which direction.

What I’m saying is that I can simultaneously (a) accept that Obama has a 72 percent chance of winning and (b) say the election is too close to call. . . .

More at the link, including a discussion of why it’s *not* correct to say that “Romney’s a touchdown behind with five minutes to play.”

{ 28 comments }

Yes, read Andrew’s linked piece, then scroll a few inches up: “Some polls will be clearly right, and others will be clearly wrong; some analysts will be vindicated, and some will look overconfident or hackish. Obviously nobody’s going to be hounded out of punditry if their predictions are mistaken, but we will at least be able to declare this particular argument settled, once and for all, in favor of liberals or conservatives.” Clearly Ross Douthat does not hang around at the Larchmont library.

People do not have a good intuitive sense of probability and statistics, and that is why I have argued that Nate’s ‘probability’ assessment is not a great one, even if his model is quite good (which I believe it is.) But, think for a moment. If you had a jar with seven white marbles and three black ones, and you blindly draw one out, MOST of the time you would pull out a white marble. However, if you pulled out a black one, you might be surprised, but not astonished. After all, there were a perceivable number of black marbles in the jar to begin with. And that’s all Nate is saying, in my view. Given a set of circumstances that he is observing in the polling data, it is similar to a 71% probability of an Obama win. The circumstances could change, and there is a distinct possibility that Romney could win, in spite of the odds, but that’s the way it looks right now with this environment.

If Romney wins, it doesn’t mean that Nate was wrong. It only means that, in spite of the odds, the lower probability outcome prevailed.

Joe:

Yes, that is exactly what I wrote:

Andrew, Sorry for repeating your point well made.

I would not respect intrade that much. After the last debate Obama fell by 10 points for no particular reason, then took a couple days to recover. Whatever you think of all the fluctuations in Nate Silver’s model, at least they aren’t caused by someone buying 20,000 shares of Romney to try to juice the numbers.

What’s the data on other presidential elections described in this way (Obama has a 70% chance)? “The Obama-Romney race is like the Bush-Gore race,” or something like that, might be an interesting way to frame things.

What presidential elections were over 90? What presidential elections were in the 50-60 range? How “close” to presidential elections tend to be?

There seems to be vote rigging in presidential elections that can be seen by looking at statistics.

Fraudulently, a computer program flips a percentage of votes from one candidate to another but that the percentage of votes flipped varies with the size of the precinct. The rules of the fraudulent vote flipping algorithm are:

Very small precincts don’t have any votes flipped.

The percentage of votes that are flipped is small (such as .01%) for small precincts and large (such as 5%) for large precincts with a *gradual* change in percentage “flipped”.

The reason perpetrators don’t flip as many votes in small districts is because a recount will check a random number of precincts and a smaller precinct is *more* likely to be audited because there are more of them. So if fraudulent vote flipping flips 5% of votes from Democrats to Republicans, a random recount of precincts would show a smaller error – such as 2%.

The authors of the paper show that the effect does not happen in data from some counties presumably because the perpetrators did not have access to those tabulating computers and it is not seen in democratic primaries. The authors of the study look at income and poverty rates which are highly correlated with voter choice but do not correlate with precinct size. This rules out more Republicans living in large precincts as being the cause of the anomalous data.

To find the article, do a google on the following words: vote flipping large precincts central tabulator

I’d only observe that we have a hard enough time running a basic, non-rigged election. It’s hard to accept that the large conspiracy needed for broad systematic rigging could even be done, much less kept secret.

The central computers that tabulate all the precincts in each state are compromised. It does not take a big workforce in each state to do this – just one group skilled in hacking. States don’t have the resources to dig through the computer code to look for hacks. They rely on the companies that build the system. And those companies are targets for the hackers.

It is the public’s job to “out” the data and criminal activity since the officials won’t do their job because it could cost them their job

Professor Gelman, this election obviously entails a lot of complex things, but I think one that’s worth keeping in mind: turnout and likely voter screens. People have varying ideas about who’ll show up on election day. Republicans are on many measures more enthusiastic about voting than Democrats, and some findings suggest that these differences are sizable and substantial. Moreover, some Dem-leaning constituencies like youth and Hispanics have been hit hard by the recession and aren’t too pumped.

This has been discussed and explored extensively, but figuring out the screen is a challenge. Today, one Democratic pollster tweeted that ninety percent of voters cast a ballot in November, but Nate Silver had an exchange addressing the question and said that seventy five percent of RVs actually vote. Some polls show a normal or sub-normal LV-RV gap, whereas others show a sizable one.

“Silver was predicting an approximate 50.3 percent of the two-party vote share for Obama”

Note — if this comes from the top-line numbers on Nate’s site, those aren’t two-party vote share. For instance, right now Nate’s site predicts a 50.5-48.6 popular vote, which actually gives Obama a 51.0 percent share of the two-party vote.

Sorry if this isn’t the point you were making, since I’m teaching my grandmother to suck eggs here, but I wasn’t sure where else the number might be coming from.

Adding, looking at Nate’s graph of his popular vote predictions, the closest prediction he’s ever made seems to be Obama 49.8-Romney 49.1 on Oct. 12 (when he was giving Obama a 61.1% chance of electoral college victory), which translates to 50.35% of two-party vote share.

Matt:

Silver was predicting a 50.8% share. The 50.3% was the implicit share corresponding to the estimate that he had a 62% chance of winning.

Thanks for the explanation — but wasn’t he predicting a 72.9% chance of winning?

Matt:

Yes, it was Intrade that was giving 62%. I was saying that it might look like there was a big difference between Intrade’s 60% and Nate’s 70%, but this apparently huge difference in probabilities corresponds to a tiny and unmeasurable difference in expected vote shares.

I agree that people have difficulty understanding probabilistic forecasts. And that’s likely even more true of emotionally charged issues like elections. But, I don’t think you’re helping here.

Imagine you were taking their family to New Hampshire for a weekend trip to include one day at Storyland for your four year old. The weather forecast is 30% chance of rain on Saturday and 70% chance of rain on Sunday. Your wife asks which day the family should plan to go to the amusement park, do you say “too close to call?”

No, you do not. You say Saturday is our better bet. People aren’t THAT hopeless with odds.

Ogt:

Sure, but you’ve changed the problem. There’s no Saturday and Sunday here, there’s only one election day. To translate to your story, you have to say that he weather forecast is 30% chance of rain on Saturday and . . . that’s it. I agree that if you have the choice between the two days, you’d choose Saturday, but that’s not what’s happening for the election. As I wrote in the linked article, given the information available, I’d bet on Obama, but the election is still to close to call, as the predicted margin of victory is much less than any reasonable level of uncertainty.

I guess I don’t get the value of the appellation of “too close to call” in this context. If you would clearly bet on an outcome, then you’re making a call, at least, I would argue, as is commonly understood.

From an academic perspective, I understand that any evidence this weak would be very inconclusive. But, I think the betting perspective is much better for communicating probabilistic forecasts to the general public.

In my opinion it would be both more accurate and clearer communication to say that Obama is a ‘Slight Favorite’ rather than it is ‘Too Close to Call’

Got:

I don’t think the general public understands these odds, hence the silliness (to me) of it being considered newsworthy when a forecast probability goes from 65% to 70% (which corresponds to something like a meaningless 0.2% shift in forecast vote share). Also note the public policy professor linked to in my article: he read “2:1 odds” and transmitted it to “ahead by a touchdown with 5 minutes to go,” which is actually 9:1 odds. There’s a big difference between 2:1 and 9:1, and I think that people interpret these 70% claims much more deterministically in an election context than they would in the context of Pr(rain).

But, yes, I agree with you that 60% != 50%.

We are relying on the scientific capabilities of the pollsters to do a good job. This keeps the hackers who steal elections at bay somewhat. But when the pollsters and officials fall down on the job, we have to rely on ourselves to make sure this election is not hacked by computer programmers who will steal the election – as has been done in the past.

I’m not really even sure that Nate’s model says that “Obama” has a 72% chance of winning. That’s something of a simplification. Obama’s real chances of winning are 100% of 0% because there will only be a single event, and probability is defined as number of successful outcomes divided by number of possible outcomes.

What Nate does is run a computer simulation of the single election 10,000 times (“Monte Carlo”) and counts the number of times Obama wins versus the number of times Romney wins, but clearly the real election will not be repeated 10,000 times.

Not to get too geeky about it, poll aggregation is a great way of telling us succinctly, accurately, and honestly “what the collective polls say” (today). It’s much, much better more succinct and more accurate than pundit analysis, which cherry picks polls and then creates noise. But what the aggregate polls say X days from election day, doesn’t necessarily tell us what the aggregate polls will say on election day, and it doesn’t necessarily tell us how accurate they will be at predicting the outcome on election day.

However, there is an accumulating database of this information, which can be applied.

To oversimplify, state poll aggregates are believed to predict actual outcomes with fairly high confidence when a properly constructed polling average shows a lead by either candidate of 2.5 pts or more on election day. The accuracy of the polling average decreases more quickly when the average lead is less than 2.5 pts.

Put another way, based on past data, a properly aggregated state poll on election day has almost always predicted the correct state winner whenever the aggregate shows a lead of 2.5% of more. It often predicts the winner even when the aggregate margin is less than 2.5%, but it is wrong more often.

It’s also not clear whether state by state polling error on election day is “independent” for each state or “systemic” across all states in that election. So that’s another complication. Assuming independence, it would rare for ALL state poll averages to be wrong by the same amount.

Simon Jackman probably describes it more succinctly when he talks about converting a poll average to a forecast. http://www.huffingtonpost.com/simon-jackman/converting-a-poll-average_b_2044222.html?utm_hp_ref=@pollster

Maybe someone should gently beat up Nate for using the word “probability” of winning the election. It’s more accurate to focus on the probability that the poll averages (X days in advance) accurately forecast the outcomes.

BTW, I have a question about how poll aggregators model/compute win “probability” from poll averages. Its one thing to directly convert the poll average to a coinflip probablity, say 51% Obama 49% Romney, say for Iowa, and another to weaken the poll average by its probability of being wrong, and then converting the number to a coinflip probability. It’s not clear to me which one Nate does. If the former, then your position is even more accurate. In Jackman’s model, the confidence that the poll average is the final outcome is much lower than the confidence in the poll average. A poll average can accurately summarize a group of polls with great confidence, but predict an actual outcome with much less confidence.

Paul:

As I wrote in my earlier blog on this topic, Nate’s in a tough position. If his goal is to forecast the election, then each new poll provides approximately zero information. But people want news news news! There are two ways for Nate (and others) to supply news when none exists. The first is to report polls, or poll aggregates, or whatever. The second is to report numbers to meaningless precision, e.g., a “65.7% chance” Obama will win. These probabilities can jump all over the place based on tiny bits of noise. An undetectable 0.4% shift in Obama’s forecasted vote share maps to a seemingly large 10% shift in his probability of winning.

Three unrelated thoughts:

1. You say Hibbs’ model “doesn’t directly factor in incumbency”. Of course it does. It says the incumbent needs x% growth to win. Incumbency advantages are built into x–if we suddenly awoke in a world where incumbency were a detriment, then Hibbs’ regression (if he reran it) would find incumbents needing a much higher growth rate to win reelection. The whole concept is built around incumbency.

2. People jumping on Andrew and Nate saying you can’t simultaneously give 72% odds and say “too close to call” are seriously missing the point. 72% IS too close to call. If you “call” it, that means you know flat out who will win. When Nate throws a 100% up on his site (for example, he currently says there is a 100% chance that the Democrats will win/hold 46 or more Senate seats), THEN he is saying it is not too close to call.

3. I’ve never been a fan of significant figures. (That is a doctrine of giving an answer with more or less decimal places depending on how precise your measurement is.) If your model gives a result of 53.5424 +/- 1, sig figs enthusiasts say you should write that as 54 (the +/- 1 is optional), because adding in the extra digits implies a level of precision greater than what you really have. I say if you write 54 then you’re throwing away accuracy–you don’t know that 53.5 is really the right figure, but your research says it’s a better guess than 54.0. I would write 53.5 +/- 1. So I don’t begrudge Nate his 73.5%. I’d probably round that to 74, and Andrew would say 70, but if you state your level of precision separately then you can put whatever digits you want. In this field the concept of precision is kind of hazy for me, since we’re not making a measurement–really the 73.5% itself is an estimate of uncertainty, so everyone’s butt is covered.

On second thought, I should retract point 1. If an incumbent advantage exists, then Hibbs’ model does include it, but it hands it out indiscriminately. That is, if we assume that sitting presidents get an advantage (but new candidates from the incumbent party do not) then Hibbs’ model gives incumbents some but not all of that advantage*, and gives new candidates some advantage when they should get none.

Wouldn’t it have been better if Silver said feeding the polling data into his algorithm suggests that Obama will win, but there is 25% chance (or whatever) that this prediction is wrong? Same point, just slightly different framing, but somehow, this seems far less incendiary.

Hs:

Good idea. I like it!

Obviously the “average” person has trouble intuiting probability. Otherwise, there would be no casinos.

Comments on this entry are closed.