As John suggested in his post this morning, he and I have been working with Ben Highton of UC Davis to develop some predictions for congressional elections. We’re taking a very particular approach here: we want to know how far the “fundamentals” of elections can get us toward an accurate read on congressional outcomes. By fundamentals I mean anything that is largely unaffected by the back and forth of the fall campaign in each district.

We developed a model that includes two types of fundamentals: 1) national factors like the economy that affect every race to some extent, and 2) district-level factors like whether an incumbent is running for reelection. We then ran this model for all House elections from 1952 through 2010, and predicted 2012 from that.

**The bottom line: our model predicts Democrats will win 194 seats (44.6%), one more than they currently hold, with a one in four chance that they will take back the House.** Democrats do better in terms of vote share—48.9%, or almost two percent more than they got in 2010—but the extra votes don’t make a difference for the expected seat gain.

When building such a model for congressional elections, there are a couple of general approaches. First are predictions that use only national-level fundamentals. There were several such models (gated) for the midterm House election in 2010. The second approach is handicappers such as Charlie Cook and Congressional Quarterly, which take a race-by-race approach. This offers invaluable information about those races, but it also incorporates information other than the fundamentals, and it can sometimes make it hard to know how the broader political winds are blowing. A few models from 2010—for example, 538 Blog, or Bafumi, Erikson, and Wlezien (gated)—offered a mix of these approaches, relying on both fundamentals and handicapping/polls.

For our predictions, we decided to walk a path between the two. We wanted to constrain ourselves to the fundamentals, so we included some national factors along those lines. But the race for Congress is 435 different campaigns with as many different sets of candidates. We wanted to say something about each district in the context of that national climate.*

Our model includes the following national-level factors for the years 1952-2010:

- GDP: change over the previous two quarters, measured at the second quarter of the election year.
- Presidential approval: from Gallup, as of June of the election year.
- An indicator for midterm elections: to capture the typical seat and vote loss for the president’s party.
- The party of the president.

These factors are similar to the presidential prediction model John, Lynn Vavrek, and Seth Hill created for the Washington Post.

We then augmented this model with the following district-level factors:

- Incumbency: Republican incumbent / open seat / Democratic incumbent.
- District-level presidential vote: the deviation off the average presidential vote that year, lagged one election for non-presidential years. Two HUGE shout-outs to Gary Jacobson of UC San Diego, who provided most of the data for this time series, and to Daily Kos for calculating presidential vote for the redrawn districts and making the results available to anyone who wants them.**

Incumbency captures the well-known boost that incumbents get (for a variety of reasons), while the presidential vote captures the underlying partisan complexion of the district relative to others. Both variables also tell us something about a party’s “exposure”: how many seats it holds, and how many it probably should *not *hold.

How well did this model perform? Here is a scatterplot of the predicted and actual Democratic share of the vote for the House using this model, with the 2012 prediction marked in red:

The model’s predictions are quite accurate, missing the actual result by an average of only one percentage point. It’s worth noting that the predictions are still pretty good if we base our model on 1952-1992 and predict 1994-2010 off that. So the basic dynamics of House elections have been reasonably stable over this period of time, at least once they’re aggregated to a single national number.

Next, we use the predictions for each seat to get wins and losses, and then aggregate up for overall seat share. Here’s the predicted-vs.-actual graph for that:

The prediction error is worse for seats. The model is especially bad in big swing years like 1958, 1964, 1966, 1994 and 2010, when it tends to understate the successful party’s performance. That’s not because it gets the vote wrong: the vote prediction is pretty close for all those years. Instead, it’s more likely that a bunch of close contests broke the winning party’s way, greatly expanding the gains over what the model expected. Since 2012 is probably *not *going to be a big swing year, we might have a little more confidence in the prediction this time around.

Nonetheless, that means if Democrats can eke out some narrow wins in key districts, their performance could be a lot better than expected. Republicans are very “exposed,” in the sense that they hold a lot of seats, many of them in uncomfortable territory, and this year will likely be a better one for Democrats than 2010. Containing losses is clearly the name of the game for the Republican party. Models that take into account the ongoing dynamics of individual races—either through spending, polls, or general handicapping—might do a better job of picking up that sort of development.

One last thought. It’s hard to miss the 5-point gap between vote and seat share in 2012. This gap is not uncommon in House elections, but it is on the high side (the average gap over the past 60 years has been about 3 points). Why is there such a gap this time around? Could it be the redistricting? Or something else? We’ll address that more directly in a future post, so stay tuned.

*Because we include both national and district-level factors, we run a multi-level model with random intercepts for years and districts.

**The careful reader will note that we don’t really have a pure forecasting model, since we use the *contemporaneous *presidential vote in our model (e.g., the 2008 outcome is predicted with the 2008 presidential vote, the 2004 outcome with the 2004 vote, etc.). We do it this way because it minimizes the number of elections where using the last presidential vote would cross a redistricting year. Nonetheless, it’s a legitimate concern. It’s partly why we use the mean-deviated presidential vote: we’re trying capture how partisan a district is *relative to other districts*, which is something that changes less over time. To be safe, in years where the last presidential election didn’t cross a redistricting, we compared our predictions to predictions using the last presidential vote (for example, using 2004 to predict 2008). The predictions were virtually identical.