Post-publication peer review: How it (sometimes) really works

by Andrew Gelman on September 1, 2013 · 2 comments

in Methodology

In an ideal world, research articles would be open to criticism and discussion in the same place where they are published, in a sort of non-corrupt version of Yelp. What is happening now is that the occasional paper or research area gets lots of press coverage, and this inspires reactions on science-focused blogs. The trouble here is that it’s easier to give off-the-cuff comments than detailed criticisms.

Here’s an example. It starts a couple years ago with this article by Ryota Kanai, Tom Feilden, Colin Firth, and Geraint Rees, on brain size and political orientation:

In a large sample of young adults, we related self-reported political attitudes to gray matter volume using structural MRI. We found that greater liberalism was associated with increased gray matter volume in the anterior cingulate cortex, whereas greater conservatism was associated with increased volume of the right amygdala. These results were replicated in an independent sample of additional participants. Our findings extend previous observations that political attitudes reflect differences in self-regulatory conflict monitoring . . .

My reaction was a vague sense of skepticism, but I didn’t have the energy to look at the paper in detail so I gave a sort of sideways reaction that did not criticize the article but did not take it seriously either:

Here’s my take on this. Conservatives are jerks, liberals are wimps. It’s that simple. So these researchers can test their hypotheses by more directly correlating the brain functions with the asshole/pussy dimension, no?

A commenter replied:

Did you read the paper? Conservatives are more likely to be cowards/pussies as you call it – more likely to jump when they see something scary, so the theory is that they support authoritarian policies to protect themselves from the boogieman.

The next month, my coblogger Erik Voeten reported on a similar paper by Darren Schreiber, Alan Simmons, Christopher Dawes, Taru Flagan, James Fowler, and Martin Paulus. Erik offered no comments at all, I assume because, like me, he did not actually read the paper in question. In our blogging, Erik and I were publicizing these papers and opening the floor for discussion, although not too much discussion actually happened.

A couple years later, the paper by Schreiber et al. came out in a journal and Voeten reblogged it, again with no reactions of his own. This time there was a pretty lively discussion with some commenters objecting to interpretations of the results, but nobody questioning the scientific claims. (The comment thread eventually became occupied by a troll, but that’s another issue.)

More recently, Dan Kahan was pointed to this same research article on “red and blue brains,” blogged it, and slammed it to the wall:

The paper reports the results of an fMRI—“functional magnetic resonance imagining”— study that the authors describe as showing that “liberals and conservatives use different regions of the brain when they think about risk.” . . .

So what do I think? . . . the paper supplies zero reason to adjust any view I have—or anyone else does, in my opinion—on any matter relating to individual differences in cognition & ideology.

Ouch. Kahan writes that Schreiber et al. used a fundamentally flawed statistical approach in which they basically went searching for statistical significance:

There are literally hundreds of thousands of potential “observations” in the brain of each study subject. Because there is constantly varying activation levels going on throughout the brain at all time, one can always find “statistically significant” correlations between stimuli and brain activation by chance. . . .

Schreiber et al. didn’t discipline their evidence-gathering . . . They did initially offer hypotheses based on four precisely defined brain ROIs in “the right amygdala, left insula, right entorhinal cortex, and anterior cingulate.” They picked these, they said, based on a 2011 paper [the one mentioned at the top of the present post] . . .

But contrary to their hypotheses, Schreiber et al. didn’t find any significant differences in the activation levels within the portions of either the amygdala or the anterior cingulate cortex singled out in the 2011 Kanai et al. paper. Nor did Schreiber et al. find any such differences in a host of other precisely defined areas (the “entorhinal cortex,” “left insula,” or “Right Entorhinal”) that Kanai et al. identified as differeing structurally among Democrats and Republicans in ways that could suggest the hypothesized differences in cognition.

In response, Schreiber et al. simply widened the lens, as it were, of their observational camera to take in a wider expanse of the brain. “The analysis of the specific spheres [from Kanai et al.] did not appear statistically significant,” they explain,” so larger ROIs based on the anatomy were used next.” . . .

Even after resorting to this device, Schreiber et al. found “no significant differences . . . in the anterior cingulate cortex,” but they did manage to find some “significant” differences among Democrats’ and Republicans’ brain activation levels in portions of the “right amygdala” and “insula.”

And it gets worse. Here’s Kahan again:

They selected observations of activating “voxels” in the amygdala of Republican subjects precisely because those voxels—as opposed to others that Schreiber et al. then ignored in “further analysis”—were “activating” in the manner that they were searching for in a large expanse of the brain. They then reported the resulting high correlation between these observed voxel activations and Republican party self-identification as a test for “predicting” subjects’ party affiliations—one that “significantly out-performs the longstanding parental model, correctly predicting 82.9% of the observed choices of party.”

This is bogus. Unless one “use[s] an independent dataset” to validate the predictive power of “the selected . . .voxels” detected in this way, Kriegeskorte et al. explain in their Nature Neuroscience paper, no valid inferences can be drawn. None.

Kahan follows up one of my favorite points, on the way in which multiple comparisons corrections exacerbate the statistical significance filter:

Pushing a button in one’s computer program to ramp up one’s “alpha” (the p-value threshold, essentially, used to avoid “type 1” errors) only means one has to search a bit harder; it still doesn’t make it any more valid to base inferences on “significant correlations” found only after deliberately searching for them within a collection of hundreds of thousands of observations.

Wow. Look what happened. Assuming Kahan is correct here, we all just accepted the claimed results. Nobody actually checked to see if they all made sense.

I thought a bit and left the following comment on Kahan’s blog:

Read between the lines. The paper originally was released in 2009 and was published in 2013 in PLOS-One, which is one step above appearing on Arxiv. PLOS-One publishes some good things (so does Arxiv) but it’s the place people place papers that can’t be placed. We can deduce that the paper was rejected by Science, Nature, various other biology journals, and maybe some political science journals as well.

I’m not saying you shouldn’t criticize the paper in question, but you can’t really demand better from a paper published in a bottom-feeder journal.

Again, just because something’s in a crap journal, doesn’t mean it’s crap; I’ve published lots of papers in unselective, low-prestige outlets. But it’s certainly no surprise if a paper published in a low-grade journal happens to be crap. They publish the things nobody else will touch.

Some of my favorite papers have been rejected many times before finally reaching publication. So I’m certainly not saying that appearance in a low-ranked journal is definitive evidence that a paper is flawed. But, if it’s really been rejected by 3 journals before getting to this point, that could be telling us something.

One of the problems with traditional pre-publication peer review is that it’s secret. What were the reasons that those 3 journals (I’m guessing) rejected the paper? Were they procedural reasons (“We don’t publish political science papers”), or irrelevant reasons (“I just don’t like this paper”), or valid criticisms (such as Kahan’s noted above)? We have no idea.

As we know so well, fatally flawed papers can appear in top journals and get fawning press; the pre-publication peer-review process is far from perfect. Post-publication peer-review seems like an excellent idea. But, as the above story indicates, it’s not so easy. You can get lots of “Andy Gelmans” and “Erik Voetens” who just post a paper without reading it, and only the occasional “Dan Kahan” who takes a detailed examination.

P.S. The above post is unfair in three ways.

1. It’s misleading to call Plos-One a “crap journal.” Yes, it publishes articles that other journals won’t publish. But that doesn’t make it crap. As various commenters have pointed out, Plos-One has a different publication model compared to traditional journals. “Different” doesn’t mean “crap.”

2. I have no particular reason to think that the paper above was rejected by others before being submitted to Plos-One.

3. Just because the methods in this paper have problems, that doesn’t mean its conclusions are wrong. The data analysis provides some support for the conclusions, even if the evidence isn’t quite as strong as claimed.

I recognize that this sort of study is difficult and costly, and I have a great respect for researchers who work in this area. If I can contribute via some statistical scrutiny, it is not to shoot all this down but rather with the goal of helping these resources be used more effectively.

{ 2 comments }

Comments on this entry are closed.

Previous post:

Next post: