Why the Stevens Op-Ed is Wrong

A rather lengthier response to Jacqueline Stevens’ op-ed. Speaking to various points in turn.

the government — disproportionately — supports research that is amenable to statistical analyses and models even though everyone knows the clean equations mask messy realities that contrived data sets and assumptions don’t, and can’t, capture.

The claim that real politics is messier than the statistics are capable of capturing is obviously correct. But the implied corollary – that the government shouldn’t go out of its way to support it – doesn’t follow. Jacqueline Stevens doesn’t do quantitative research. Nor, as it happens, do I. But good qualitative research equally has to deal with messy realities, and equally has to adopt a variety of methodological techniques to minimize bias, compensate for missing data and so on. Furthermore, it is also extremely difficult to do at large scale – this is where the big projects that the NSF funds can be very valuable. I agree that it would be nice to have more qualitative research funded by NSF – but I also suspect that qualitative scholars like myself are a substantial part of the problem (if we don’t propose projects, they aren’t going to get funded).

It’s an open secret in my discipline: in terms of accurate political predictions (the field’s benchmark for what counts as science), my colleagues have failed spectacularly and wasted colossal amounts of time and money.

The claim here – that “accurate political prediction” is the “field’s benchmark for what counts as science” is quite wrong. There really isn’t much work at all by political scientists that aspires to predict what will happen in the future – off the top of my head, all that I can think of are election forecasting models (which, as John has noted are more about figuring out good theories of what drives politics, rather than prediction as such), and some of the work of Bruce Bueno de Mesquita. It is reasonable to say that the majority position in political science is a kind of soft positivism, which focuses on the search for law-like generalizations. But that is neither a universal benchmark (I, for one, don’t buy into it), nor indeed, the same thing as accurate prediction, except where strong covering laws (of the kind that few political scientists think are generically possible) can be found.

As best as I can decipher her position from her blog, and from a draft paper which she links to, Stevens’ underlying position is a quite extreme Popperianism, in which probabilistic generalizations (which are the only kind that social scientists aspire to find) don’t count as real science. Even one disconfirming instance is enough to refute a theory. Hence, Stevens argues in her paper that Fearon and Laitin’s account of civil wars has been falsified, because there are a couple of specific cases that have been interpreted as saying something that disagrees with Fearon and Laitin’s findings, and ergo, the entire literature is useless. I’m not going to get stuck into a debate which others on this blog and elsewhere are far better qualified to discuss than I am, but suffice to say that the Popperian probability-based critique of social scientific models is far from a decisive refutation of the social scientific enterprise. Furthermore, Stevens’ proposed alternative – an attempted reconciliation of Popper, Hegel and Freud – seems to me to be unlikely in the extreme to provide a useful social-scientific research agenda.

What about proposals for research into questions that might favor Democratic politics and that political scientists seeking N.S.F. financing do not ask — perhaps, one colleague suggests, because N.S.F. program officers discourage them? Why are my colleagues kowtowing to Congress for research money that comes with ideological strings attached?

I’m not quite clear what the issue is here. What does Stevens mean by ‘Democratic politics’? If the claim is that the NSF should be funding social science that is intended to help the Democrats in their struggle with other political groupings (the usual meaning in the US of the word Democratic with a capital D), that’s not what the NSF is supposed to be doing. If it’s that the NSF doesn’t fund projects that support Stevens’ own ideal understanding of what democratic politics, then that’s unfortunate for her – but the onus is on her to demonstrate the broader social scientific benefits (including to people who don’t share her particular brand of politics) of the project. More generally, the standard of evidence here is unclear. A colleague “suggests” that NSF program officers discourage certain kinds of proposals. Does this colleague have direct experience himself or herself of this happening? Has this colleague credible information from others that this has happened? Or is the colleague just letting off hot air? Frankly, my money is on the last of these, but I’d be happy to be corrected if wrong.

Many of today’s peer-reviewed studies offer trivial confirmations of the obvious and policy documents filled with egregious, dangerous errors. My colleagues now point to research by the political scientists and N.S.F. grant recipients James D. Fearon and David D. Laitin that claims that civil wars result from weak states, and are not caused by ethnic grievances. Numerous scholars have, however, convincingly criticized Professors Fearon and Laitin’s work. In 2011 Lars-Erik Cederman, Nils B. Weidmann and Kristian Skrede Gleditsch wrote in the American Political Science Review that “rejecting ‘messy’ factors, like grievances and inequalities,” which are hard to quantify, “may lead to more elegant models that can be more easily tested, but the fact remains that some of the most intractable and damaging conflict processes in the contemporary world, including Sudan and the former Yugoslavia, are largely about political and economic injustice,” an observation that policy makers could glean from a subscription to this newspaper and that nonetheless is more astute than the insights offered by Professors Fearon and Laitin.

It would certainly have been helpful if Stevens had made it clear that Cederman, Weidmann and Gleditsch were emphatically not arguing that quantitative approaches to civil war are wrong. Indeed, just the opposite – Cederman, Weidmann and Gleditsch are themselves heavily statistically oriented social scientists. The relationships that they find are not obvious ones that could be “gleaned” from a New York Times subscription – they are dependent on the employment of some highly sophisticated quantitative techniques. The “which are hard to quantify” bit that Stevens interpolates between the two segments of the quote is technically true but rather likely to mislead the casual reader. The contribution that Cederman, Weidmann and Gleditsch seek to make is precisely to quantify the relationship between inequality-driven grievances and civil war outcomes.

The G-Econ data allow deriving ethnic group–specific measures of wealth by overlaying polygons indicating group settlement areas with the cells in the Nordhaus data. Dividing the total sum of the economic production in the settlement area by the group’s population size enables us to derive group-specific measures of per capita economic production, which can be compared to either the nationwide per capita product or the per capita product of privileged groups.

This is emphatically not a debate showing that quantitative social science is wrong – it is a debate between two different groups of quantitative social scientists, with different sets of assumptions.

How do we know that these examples aren’t atypical cherries picked by a political theorist munching sour grapes? Because in the 1980s, the political psychologist Philip E. Tetlock began systematically quizzing 284 political experts — most of whom were political science Ph.D.’s — on dozens of basic questions, like whether a country would go to war, leave NATO or change its boundaries or a political leader would remain in office. … Professor Tetlock’s main finding? Chimps randomly throwing darts at the possible outcomes would have done almost as well as the experts.

Under the very kindest interpretation, this is sloppy. Quite obviously, one should not slide from criticisms of quantitative academic political scientists to criticisms of people with political science Ph.D.s without making it clear that these are not at all the same groups of people (lots more people have Ph.D.s in political science than are academic political scientists; there are lots more academic political scientists than quantitatively oriented academic political scientists). Rather worse: Stevens’ presentation of Tetlock’s research is highly inaccurate. As Tetlock himself describes his test subjects (p.40):

Participants were highly educated (the majority had doctorates) and almost all had postgraduate training in fields such as political science (in particular, international relations and various branches of area studies), economics, international law and diplomacy, business administration, public policy and journalism [HF: my emphasis].

In other words, where Stevens baldly tells us that “most of [Tetlock’s experts] were political science Ph.D.s,” Tetlock himself tells us that a majority (not most) of his experts had Ph.D.s in some field or another, and that nearly all of them had postgraduate training in one of a variety of fields, six of which Tetlock names, and one of which was political science. Quite possibly, political science was the best represented of these fields – it’s the first that he thought to name – but that’s the most one can say, without access to the de-anonymized data. This is very careless writing on Stevens’ part, and she really needs to retract her incorrect claim immediately. Since it is a lynchpin of her argument – in her own words, without it she could reasonably be accused of being a cherry-picking sour-grape-munching political theorist – her whole piece is in trouble. Tetlock’s book simply doesn’t show what she wants and needs it to show for her argument to be more than impressionistic.

The rest of the piece rehashes the argument from Popper, and proposes that NSF funding be distributed randomly through a lottery, so as to dethrone quantitative social science. Professor Stevens surely knows quite as well as I do that such a system would be politically impossible, so I can only imagine that this proposal, like the rest of her op-ed, is a potshot aimed at perceived enemies in a very specific intra-disciplinary feud. I have some real sympathy with the people on Stevens’ side in this argument – as I and Marty Finnemore have argued, knee-jerk quantificationism has a lot of associated problems. But the solution to these problems (and to the parallel problems of qualitative research) mostly involve clearer thinking about the relationship between theory and evidence, rather than the abandonment of quantitative social science.

24 Responses to Why the Stevens Op-Ed is Wrong

  1. Jacqueline Stevens June 24, 2012 at 6:30 am #

    I’ll let others debate your interpretations but I do want to clear up the information on the background of the Tetlock survey respondents and the relevance of this to my analysis.

    You’re right about one point: I read the passage on the backgrounds too quickly. As for this destroying my argument, please read the rest of the book. Tetlock writes: “It made virtually no difference whether participants had doctorates, whether they were political scientists, journalists, or historians, whether they had policy experience or access to classified information, or whether they had logged many or few years of experience in their line of work. As noted in Chapter 2, the only consistent predictor was, ironically, fame, as indexed by a Google count: better-known forecasters–those more likely to be feted by the media–were less well calibrated than their lower-profile colleagues.” In other words, we don’t have to guess about whether among this group political scientists were fabulous forecasters and the overall quality of Tetlock’s expert sample was brought down by, say, the journalists. We know that political scientists didn’t do any better than anyone else.

    I have alerted my editor to both of these points.

  2. Henry Farrell June 24, 2012 at 7:53 am #

    But this doesn’t help you very much, does it. If you are making a case for the unique awfulness of quantititative political science, the purported fact that quantitative political scientists do no better than a variety of experts with other disciplinary backgrounds in predicting political outcomes is moderately embarrassing, but no more than that, since so few political scientists claim to be in the prediction game in the first place. Furthermore, it is a purported fact. Tetlock suggests that his political science trained people tended to have backgrounds in either international relations, or in various branches of area studies. Given the timing of the study, it’s plausible that many of the international relations people had little in the way of a quantitative background – the quantification of the discipline really only started taking over in the late 1990s and early 2000s. It’s prima facie highly probable that the people with areas studies training were not quantitatively oriented – after all, the most bitter disputes of this whole long war have been over whether or not areas studies people are, in some sense ‘unscientific’ because they don’t buy into rational choice and quantitative methodologies. It is very sloppy not only to completely mischaracterize Tetlock’s study as a study of the failings of political science, but also not to acknowledge that there is (a) no reason to believe that the specific political scientists involved in the study exemplified the approaches you wanted to criticize, and (b) that there is some tentative reason to believe that they were systematically likely to underrepresent these approaches. While I’m glad that you have alerted your editor to your basic error, I’m sorry (albeit not especially surprised) to see that you are failing to see the fundamental damage this does to your broader argument. Tetlock (whom I have read, as it happens) has some interesting things to say about the way that hedgehogs try to reinterpret past mistakes in ways that insulate themselves from disconfirmation.

  3. Andrew Gelman June 24, 2012 at 8:14 am #


    I still think my response was pithy and to the point (following the classic “Don’t feed the trolls” advice) but I suppose your more serious reply is helpful too. . . .

  4. Daragh McDowell June 24, 2012 at 8:55 am #

    As an area studies geek myself I’d just to add the addendum that there’s a potential apples and oranges problem to comparing my lot of unscientific chancers to the theory-obsessed ivory tower dwellers in IR. When I got sorted into the IR dept. for my PhD I was basically doing something very different to the rest of my colleagues – they were working on broader theories and problems with existing theories. I was engaged in post-Soviet ‘soaking and poking’ to see what I could say about how that area of the world ‘works.’ Both, I believe, are thoroughly valuable in different ways and for different reasons. But they are very different processes IMHO – yet my degree will still say DPhil International Relations when I’m done, for largely bureaucratic reasons. That alone makes me suspicious of the Tetlock citation.

  5. Henry Farrell June 24, 2012 at 8:56 am #

    Finally, I don’t think that the specific and repeated attacks against Fearon and Laitin’s work on civil wars (I see that you have gone after them yet again in comments on your own blog) is unrelated to the fact that Laitin was one of the main protagonists on the other side of the ‘real science versus areas studies’ battles which you still, clearly, have not gotten over a decade or more after they happened.

  6. David Karger June 24, 2012 at 9:35 am #

    As a hard scientist, I’d like to point out that even Biology has moved away from a Popperian approach. A huge amoung of biology is now devoted to *probabilistic models* that attempt to characterize the probability that certain reactions will happen in certain conditions, or the correlations between different phenomena. Our ability to make predictions that are *never* falsifiable is extremely limited. As one example, just look at the ambiguities that emerge from many clinical trials.

  7. Rebecca Gill June 24, 2012 at 10:29 am #

    I could do a decent job predicting judicial behavior. I could also probably predict scores on judicial performance evaluations. I can’t say, though, that I would be able to predict whether two countries will go to war. I have a political science PhD, but I don’t think that this fact ought to impugn the usefulness of political science. I don’t tend to go around prognosticating on questions that are outside my area of expertise. As such, I’m not sure I understand this part of the argument made in the original piece.

    • Richard Matland June 24, 2012 at 1:56 pm #

      Of course, Stevens would take the judicial vote models and point out they completely got the Bush v. Gore case wrong and then she would ignore the 80-90% of cases you can get right. Most Supreme Court models would have identified Bush v. Gore as a states rights case and predicted votes for Gore from Scalia, Thomas, Kennedy, etc. Well it didn’t work out that way. If Popper is the standard then this is proof the model is a failure. If we use a little sanity and say predicting 80-90% of the votes is pretty darn good and a whole lot better than chimps would do I think we’re both being realistic and pointing out we do know something.

  8. Prison Rodeo June 24, 2012 at 10:29 am #

    DNFTT. Seriously.

  9. Jim Johnson June 24, 2012 at 12:27 pm #


    My two cents. I am pretty critical of standard practice and its justification in the discipline. I am on the record at length in various journals on this matter. But the Stevens piece is pretty much incoherent.

  10. RobC June 24, 2012 at 2:47 pm #

    Perhaps it would be useful to explain the difference between prediction and re-testing one’s hypothesis on subsequent evidence, because without the latter the prize for most explanatory hypothesis would always seem to go to the model that most overfits the data points.

    • PBR June 24, 2012 at 11:10 pm #

      I agree, it would be great to hear more on this from authors of the blog.

  11. JB June 24, 2012 at 3:44 pm #

    “Tetlock himself tells us that a majority (not most) of his experts had Ph.D.s in some field or another…”


  12. Martin June 24, 2012 at 4:24 pm #

    In the comments of Stevens’s hard hitting self-interview, she says that Tetlock says that 34%-43% of those surveyed were polisci PhDs.

    I wonder what % of these had expertise, or even any experience, modeling and predicting behaviors/events relevant to the questions posed by by Tetlock.

  13. Marcus Kreuzer June 24, 2012 at 5:22 pm #

    It would be useful had Professor Steven given a more accurate rendition of Tetlock. She overlooks two important points:
    a) Tetlock as a cognitive psychologists is interested in inherent difficulties of making proper causal inferences given our cognitive limitations. He used the quality of predictions to test the degree of our cognitive biases and not the quality of our scientific results. This would have required replications. His book – despite his focus on prediction – in other words was not about the quality of the scientific method but about the limitations of the human mind. He makes this clear when his predictors are more often policy advisors and the talking heads in the media and NOT political scientists.
    Second, she also ignores his positive findings that scholars with a particular cognitive outlook had prediction rates far higher than the dart throwing chimps. He in turn attributed their superior predictive capacity to a distinct cognitive style which is grounded in the very ex post explanations that dominate political science and very principles of the scientific method which Professor Stevens holds in such low regard. Some of these principles avoiding logical fallacies with the help of counterfactual thinking, treating fairly rather than caricaturing alternative positions, and supporting arguments with facts that are not cherry-picked. The very standards that are woefully missing in Professor Steven’s analysis.

  14. Matthew Kocher June 24, 2012 at 5:52 pm #

    My recollection of Tetlock is that simple quantitative algorithms beat area experts in his predictive tournaments.

  15. Paul Gronke June 24, 2012 at 5:57 pm #

    I get DNFTT. But this is an op-ed in the NY Times, folks, by someone in a internationally recognized university. Even apparently idiotic quantitative political scientists should be able to predict that this editorial will have deleterious consequences and it’s misstatements and misrepresentations need to be pointed out.

  16. Jim June 25, 2012 at 11:08 am #

    Excellent rebuttal, Henry, to what is one of the worst and most irresponsible op-eds I have read in the New York Times in a LONG time. The Times’ editorial board should really be ashamed of itself for printing this. They should have done more background, and talked to more people in the field (and ideally Tetlock) before approving this piece.

    • Gregory Koger June 25, 2012 at 12:27 pm #

      Jim, from what I hear, the NYT’s fact-checking on wedding announcements is much more thorough. So, at least there’s that.

      • Adam Berinsky June 25, 2012 at 9:26 pm #

        I recall having a 15 minute conversation with a fact-checker about my two-paragraph wedding announcement, so I can confirm your sources. And see, now I’ve fact-checked your statement more thoroughly than the NYT checked Stevens op-ed.

  17. Anon June 25, 2012 at 5:17 pm #

    Prof. Stevens did not win the vote of this qualitative pol. scientist. I don’t think qual. people have been better at explaining important outcomes. I don’t know anyone (qual. or quant.) who tries to predict the future. Most are humble about their tests and conclusions. Other disciplines are not much better, if at all, at predicting events. Also, if one was systematic and looked at a range of political outcomes, I don’t think that non-academics would be better at accurately explaining or trying to predict events. Therefore, I think that pol. sci. should continue to receive government money, as long as there is an NSF in existence.

  18. Sanford Schram June 26, 2012 at 9:10 am #

    The funniest thing I heard in response is that Tetlock was evidently funded by the NSF.

  19. Sanford Schram June 26, 2012 at 11:44 am #

    I have been thinking that what I think is most wrong about Jacqueline Stevens’ political science riposte is that it is mortgaged to an overly narrow positivistic notion of science, including even buying into the idea that there is one scien…tific method and all sciences need to be judged by the extent to which they conform to that. This is most ironic since everything Stevens has written before that I know about suggests she does not believe this. And she shouldn’t!

  20. c.l. ball June 29, 2012 at 2:35 pm #

    >probabilistic generalizations (which are the only kind that social scientists aspire to find)

    These are also the kind that much of modern medical research relies on. Smoking is not a sufficient cause of lung cancer but smoking makes it much more likely that you’ll get it.