Journal of Experimental Political Science

by Andrew Gelman on July 16, 2013 · 9 comments

in Methodology,Science

The American Political Science Association is coming out with a new journal:

The Journal of Experimental Political Science features research – be it theoretical, empirical, methodological, or some combination thereof – that utilizes experimental methods or experimental reasoning based on naturally occurring data. We define experimental methods broadly: research featuring random (or quasi-random) assignment of subjects to different treatments in an effort to isolate causal relationships between variables of interest. JEPS embraces all of the different types of experiments carried out as part of political science research, including survey experiments, laboratory experiments, field experiments, lab experiments in the field, natural and neurological experiments.

We invite authors to submit concise articles (around 2500 words) that immediately address the subject of the research (although in certain cases initial submissions can be longer than this limit with the understanding that if accepted the paper will be shortened within the word constraints). We do not require lengthy explanations regarding and justifications of the experimental method. Nor do we expect extensive literature reviews of pros and cons of the methodological approaches involved in the experiment unless the goal of the article is to explore these methodological issues. We expect readers to be familiar with experimental methods and therefore to not need pages of literature reviews to be convinced that experimental methods are a legitimate methodological approach. We also consider more lengthy articles in appropriate cases, as in the following examples: when a new experimental method or approach is being introduced and discussed, when a meta-analysis of existing experimental research is provided, or when new theoretical results are being evaluated through experimentation and the theoretical results are previously unpublished. Finally, we strongly encourage authors to submit null or inconsistent results from well-designed, executed, and analyzed experiments as well as replication studies of earlier experiments.

This looks good to me. There’s only one thing I’m worried about. Regular readers of the sister blog will be aware that there’s been a big problem in psychology, with the top journals publishing weak papers generalizing to the population based on Mechanical Turk samples and college students, lots of researcher degrees of freedom ensuring there will be no problem finding statistical significance, and with the sort of small sample sizes that ensure that any statistically significant finding will be noise, thus no particular reason to expect that patterns in the data will generalize to the larger population. A notorious recent example was a purported correlation between ovulation and political attitudes.

For some reason I seem to hear more about these sorts of papers in psycyhology than in poli sci (there was this paper by some political scientists, but it was not published in an actual poli sci journal).

Just to be clear: I’m not saying that the scientific claims being made in these papers are necessarily wrong, it’s just that these claims are not supported by the data. The papers are essentially exercises in speculation, “p=0.05” notwithstanding.

And I’m not saying that the authors of these papers are bad guys. I expect that they mostly just don’t know any better. They’ve been trained that “statistically significant” = real, and they go with that.

Anyway, I’m hoping this new journal of experimental political science will take a hard line and simply refuse to publish small-n experimental studies of small effects. Sometimes, of course, small-n is all you have, for example in a historical study of wars or economic depressions or whatever. But there you have to be careful to grapple with the limitations of your analyses. I’m not objecting to small-n studies of important topics. What I’m objecting to is fishing expeditions disguised as rigorous studies. In starting this new journal, we as a field just have to avoid the trap that the journal Psychological Science fell into, of seeming to feel an obligation to publish all sorts of iffy stuff that happened to combine headline-worthiness with (spurious) statistical significance.

P.S. I wrote this post last night and scheduled it to appear this morning. In the meantime, Josh posted more on this new journal. I hope it goes well.

{ 9 comments }

anon July 16, 2013 at 11:42 am

This is great. Two issues:

1. Short article length introduces an issue of reproducibility. Indeed, I believe Nature is now allowing for longer methods sections. Online appendixes are another option.

2. I believe any serious experimental journal should only publish pre-registered studies.

Rick Wilson July 16, 2013 at 12:38 pm

You’ve put your finger on the key problem with using convenience samples. IF researchers are aiming to generalize to the population, then MTurk or other convenience samples are a problem. On the other hand, if the aim of an experiment is to establish causal direction then I worry far more about internal validity. A convenience sample, randomized into control and treatment groups, allows inferences about causal direction. Note I have not said anything about effect sizes for populations or subpopulations. I hope that all small n studies are not barred. AJPS recently published a study with 8 subjects — 4 matched controls and 4 subjects with localized brain lesions. The experiment was designed to test a specific set of neural pathways. The sample was appropriate to the question and I doubt anyone would claim that the findings were limited to a small set of lesion patients.

Experiments are useful for many forms of inference. But, they are not a panacea. An experiment is about research design to answer a well specified question. I often object that I am not an experimentalist. Rather I am a social scientist who uses a host of different tools to answer questions. Oftentimes my questions are best answered, in part, through a series of experiments.

The editors of JEPS are serious about replication. Replication is a wonderful way to avoid the pitfalls that you note in your post — speculative findings dependent on p < .05. Social psychology could have avoided a lot of recent embarrassments by encouraging replication. Registration is another path, especially to discourage fishing for findings. But sometimes fishing is not so bad — especially if you're in a very good lake.

anon July 16, 2013 at 3:28 pm

There are better ways to engage in exploratory, as opposed to confirmatory, data analyses than unstructured fishing expeditions — especially if you have a big lake.

Fishing does not inform us whether data reject a hypothesis, but whether there
exists a particular test procedure and data configuration that rejects the hypothesis of interest. The latter is almost always true, especially in large samples, and hence completely uninformative. In addition, it is misleading (some would say unethical) to present findings from fishing expeditions without reporting the amount of fishing that was done.

Unstructured fishing, as commonly practiced, is an inefficient use of data whether it is done in theCaspian Sea, or my backyard pond.

burt July 16, 2013 at 1:12 pm

I’m not so keen on the short length of articles. 2500 words is probably not enough space for any detail on theory, results, and conclusions (even if methods sections are to be very short).
Indeed, the choice of 2500 words seems to suggest that the editors think that political science theories are straightforward, that authors need not expend much effort developing theory, and/or that they don’t expect any of the papers to be theoretically (as opposed to empirically) innovative.

RobC July 16, 2013 at 1:21 pm

The concerns you express about convenience samples, small sample sizes, etc. are important. Since it may be unrealistic to expect that what you refer to as weak papers will not be published, and in fact they may often be useful to point the way for studies with better samples or for replicating studies, here’s a suggestion.

How about when somebody cites a study in a literature review or elsewhere, he or she notes in text or a footnote that the findings may be suspicious because (1) the study was based on a convenience sample or a small sample size, or (2) used a questionnaire that was ambiguous or subject to framing, or (3) has not been replicated by any other studies, or (4) is subject to any other caveats or methodological or logical problems? (Note, btw, that often the authors of a study themselves point out deficiencies, but these are lost in the mists of time when their articles are cited and re-cited and take on a life of their own.)

This would require, of course, that folks read the studies they’re citing with a critical eye and that they be willing to express what may be unwelcome opinions and judgments. But that seems like a small enough cost to bear for the benefit of preventing weak findings from becoming received truth.

Ted Brader July 16, 2013 at 1:33 pm

I had a couple reactions touching on some of the same themes as Rick’s comment. I know Josh, Becky, and the entire Section want this to be an outlet for high quality scientific papers, and in that spirit, your reminder to guard against fishing expeditions disguised as experiments or papers that substitute dazzling headlines over scientific rigor and sound inferences is good advice.

However I wouldn’t want the editors or anyone else to interpret this as saying the journal should not publish data from certain kinds of samples (students, MTurk, etc.) or that the journal should become hyper-focused on generalizability. Having read many of your other blog posts, I’m confident this is not what you were saying. Generalizability is a “problem” — in the sense, we can’t be completely sure the relationship holds in all times, under all circumstances, for all people — for just about every empirical paper ever written. The key for authors (and editors) is to make sure the paper is thoughtful about potential limitations in this regard and thus that the claims are tethered appropriately to the reach of the data.

With Rick, I think replication is the best way to establish whether some initial set of findings, based on small and/or convenience samples, is “real” or holds more broadly. Obviously individual papers still need to be held to reasonable standards for clearly explaining procedures, carrying out appropriate analyses, and making proper inferences. But political science would benefit from more replication, even of research that looks brilliant and technically virtuous from the get-go. This requires not only authors, but political science reviewers and especially editors, to place a value on replication rather than over-emphasizing novelty — a refrain we’ve all heard: “we knew that already” or “so and so already showed that result.” And by replication here I mean re-running the study from data collection forward, not just re-analyzing the same data, a task that has some value but doesn’t address the issues you raise about small or “weak” samples.)

For all their recent highly publicized issues with weak or bogus findings, one thing psychologists as a field do better than political scientists, in my view, is value replication work. Ultimately, it is what best exposes some results as circumscribed, happenstance, or bogus, and others are remarkably robust and persistent. I hope the new journal will encourage not only quality papers, but also replication (something incidentally with which convenience samples can help, because they make data collection more accessible for scholars with limited resources — a concern underscored by the fact that basic science founding for political science is under threat).

another anon July 17, 2013 at 7:56 am

I get some of the issues with trying to go from the results of a small sample size convenience study and trying to say that the results apply to larger populations. My question is though, what if a lot of those types of studies appear to be showing similar results. Is it reasonable to think that there might be something there? I’m a student still so I’m still learning methods and statistics and I would appreciate some insight from some of you.

Joshua Tucker July 19, 2013 at 9:03 am

Thanks to all for a stimulating discussion! Just wanted to offer a few quick comments in response to some of the issues that have been raised.

First, the journal is the official publication of the Experimental Research Section (#42) of APSA. It is not an “APSA journal” in the sense that APSR or Perspectives is, but rather a section journal in the same sense that other sections have affiliated journals. So it should not in any way be taken to say “APSA” as a whole is suddenly favoring one particular style of research, just that a new section has decided to launch a journal.

Second, the short article lengthwas a mandate from the Section as part of establishing the journal, and not a decision that Becky and I made. Personally, though, I am very supportive of trying to shake up the standard format by which we report academic research, and was in part attracted to the idea of editing the journal precisely because I think we need to experiment with different formats for communicating results of our research (I write for The Monkey Cage after all!). Anon mentioned Science now has longer format articles, but most of the publications in Science are short, snappy articles that non-specialists in that field can read and digest. We really don’t have anything like this in political science and I think we need much more of it. So hopefully, JEPS can be a showcase for an alternative style of presenting research results.

That being said, of course we’ll have online appendices. The idea – at least as I understand it – is to use the 2500 words to communicate the primary reason for the experiment (i.e., what you are hoping to test with it), the experimental design, and the primary results. You can put lots of other information in the appendix if you like.

Also, we don’t want to deter anyone from submitting to the journal because you have a 30 page version of your article and you don’t want to rewrite it as a 2500 word version not knowing if it is going to have a chance of being published in JEPS. So at least in the initial years, we are going to be willing to take more “standard” length articles for the first round of submissions, and only require it take on the 2500 word format if you have received an R&R; in other words, if you want to hold off on putting in the time to revise the article into a shorter format until you think you have a decent chance of getting it published in JEPS, you can.

Additionally, we specifically note that we will consider longer articles that are also trying to lay out new theoretical arguments.

However, I should note that I disagree a bit with Burt – I tend to think one of the problems with political science research is we expect every published article to have some sort of novel theoretical contribution. I actually think we would be better served by more attention to testing some of the theories (and retesting and expanding on some of the results) we already have then always striving to develop yet another new theory.

So another goal of JEPS is to take both replication and null results seriously. As Ted noted, by replication we don’t mean simply reanalyzing someone else’s data, but rather running experiments again with, for example, different subject pools. To address the discussion of generalizability, one way we can see if results from a study of undergraduates in the United States are indeed generalizable is to see what happens if we run the same experiment on adults, or even adults in a different country. For most journals, this would attract the dreaded “this is not novel enough to justifying publishing in a journal of X’s stature”, but our hope is that such work can be published in JEPS.

We also hope to publish findings with null results. Alan Gerber and others have noted the biases inherent in only publishing positive results. Of course, there is always the danger of “garbage in, garbage out” when publishing null results, so it seems like a journal predicated on the use of a shared methodological tool would be the perfect place to start publishing null results. If the reviewers can agree the experiment was well designed, well implemented, and null results are interesting, then we would like to publish those findings. In the future, we may even try to accept articles without the results already known (which would be the best way to try to address these types of biases), but we haven’t decided to go down that road yet (and likely this will be a task for the next editors…).

Anyway, hope these comments are helpful. We’ve tried to lay a lot of this information out on the CUP website for the journal – see the instructions for contributors. We’re excited about trying some new things with this journal, and hope many of you will choose to submit your work there. But rest assured, publishing high quality research remains our ultimate goal.

josh busby July 20, 2013 at 9:50 pm

I’d love to hear Gelman’s thoughts on the Berinsky, Huber, Lenz paper that reproduced a number of findings in political psych using Mechanical Turk. I understand the limitations of use for external validity, but this seems like a pretty specious knock, particularly for students starting out who don’t have thousands of dollars to run studies or may have limited funds but want to see if their arguments have plausibility in some preliminary studies with groups that are at least non-student samples.

Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk. Politicial Analysis. forthcoming (with with Gregory Huber and Gabriel Lenz).
We examine the trade-offs associated with using Amazon.com’s Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.
http://pan.oxfordjournals.org/content/20/3/351.full.pdf?ijkey=U6VhhY1Y45Sqfmx&keytype=ref

Comments on this entry are closed.

Previous post:

Next post: