How (My) Science Works

by Larry Bartels on October 31, 2012 · 2 comments

in Political science

A group called EGAP (Experiments in Governance and Politics) is considering a proposal to establish a “Pilot Registry for Research Designs” where scholars could register new research projects, specifying in advance the topic, data to be collected, hypotheses to be tested, data analysis to be conducted, and conditions under which the hypotheses would be accepted or refuted. Once the research was conducted and written up, journal editors and referees would have access to the corresponding prospectus in order to verify that the results reported in the paper were not instances of “publication bias” or mindless “fishing” for statistically significant results. Upon publication, or after some pre-specified period of time, the corresponding research prospectus would be in the public domain.

The focus of the pilot proposal is on “Prospective Research” designs, whether experimental or observational, ”for which outcomes have not yet been realized.” That is mostly not what I do. Nevertheless, a friend—perhaps inspired by the proposers’ interest in expanding the system to include “retrospective studies,” and in using experience with the proposed pilot registry to decide “whether to make registration mandatory for some kinds of research”—asks, as a “thought experiment,” how such a system would affect my work, suggesting as an example my 1996 article on “Uninformed Votes.” My response is below the fold.

I’ve done some papers—including that one—that are sufficiently simple-minded that they would not be much affected by a process of the sort you describe. On the other hand, it is also an example of an instance in which effective enforcement would seem to be impossible. I have the NES cumulative file on my computer; what’s to stop me from ransacking it night and day for random correlations, then submitting the best 5% (suitably dressed up as “hypotheses to be tested”) to the registry? I suppose I could be made to wait for new data, but in this case that moratorium would still be in effect (since I used six presidential elections—and NES might not last long enough to provide six more).


More often, my projects start with questions rather than “hypotheses” and end with findings of varying credibility rather than “accept or refute” decisions. I think of that as a kind of science—indeed, as the most fruitful kind of science we can do given where we are in our understanding of politics. And I think of the credibility of the findings as depending much more on the quality of the data and analysis than on their theoretical provenance or predictability (or, for that matter, their “statistical significance”). I recognize that many other scholars are more rigid in their views (and many others less so), which is fine with me, as long as we don’t have to waste a lot of potentially valuable time debating, or legislating, epistemology. (I assume there would be some journals that would not opt into the “registry” system, and I would stick with those rather than “constrain my implementation of the projects” I work on.)


But you want examples. Since I hosted a workshop on Unequal Democracy a few weeks ago (I am just starting to work on a revised edition), I’ve thought recently about those analyses and can provide a brief, chapter-by-chapter run-down:


 1. (Introduction) All pre-existing data in public domain and previously analyzed in similar ways by other scholars. Not sure whether or how this would be covered, since it is “merely” descriptive analysis.


2. The Partisan Political Economy. All pre-existing data in public domain. I set out to look for partisan patterns of income growth, but without any very strong preconceptions about what I would find. Having found them, I tried to make them go away, using a variety of data I didn’t know existed when I started. In the course of attempting to understand where these partisan differences came from, I also discovered some unexpected secondary patterns in the data that seem to me to shed light on that question (honeymoon years vs. non-honeymoon years; first terms following partisan turnover vs. others).


3. Class Politics and Partisan Change. All pre-existing data in public domain. This was an adaptation of previously published work, and shaped in a variety of ways by discussion and criticism of that work (e.g., differing implications of alternative measures of “class”; more elaborate analysis of a variety of potential “wedge” issues in 2004 NES data).


4. Partisan Biases in Economic Accountability. All pre-existing data in public domain. I had worked on “myopia” in economic voting in a separate project with Chris Achen, but did not think to connect it with partisan patterns of income growth until the two papers had sat near each other on my desk for a couple years. I had thought to look at economic voting by income class using NES data, then thought to examine class-specific growth, then thought to examine the effect of high-income growth on other income groups. Your editor and referees would have to decide whether the caveats in the text (“rather remarkably suggest,” “not impossible that the apparent electoral significance of high-income growth is merely a statistical fluke”) and robustness checks in the notes (dropping elections, considering various sub-samples of the data, comparing growth for other groups) were sufficient to make this pattern eligible for reporting.


5. Do Americans Care About Inequality? Mix of old and new data. Entirely descriptive except (perhaps) for interactions between information and ideology (which John Zaller would take as evidence of “polarization,” except that the concept had not previously been applied to “objective” facts).


 6. Homer Gets a Tax Cut. Mix of old and new data. The NES module I helped design included some items reflecting my interests and expectations and others suggested by collaborators with their own agendas. The concept of “unenlightened self-interest,” which is the most novel theoretical underpinning of the analyses, was induced from the data rather than derived from anywhere. The analyses of partisanship and information, egalitarian values, and trade-off preferences among taxes, spending, and deficits were all added in response to suggestions from readers of earlier versions.


7. The Strange Appeal of Estate Tax Repeal. Ditto 6, with some historical analysis added subsequent to data collection that made the implications of the chapter in the context of the book quite different than I had in mind at the start.


8. The Eroding Minimum Wage. All publicly available data (aside from some income breakdowns of public polls provided by Marty Gilens). Largely descriptive. Partisan differences (Table 8.2) anticipated; partisan interactions (Table 8.3) unanticipated (and not “statistically significant”); effects of constituency opinion and partisanship on roll call vote anticipated, but only after it occurred to me that I had overlapping data (from Chapter 9).


9. Economic Inequality and Political Representation. All publicly available data. All of the analyses originally assumed a linear effect of income on political influence; the non-parametric specification with three income groups was suggested by readers of an early draft. The analysis of mechanisms (turnout, knowledge, contact) in Table 9.11 was an unanticipated elaboration; not sure where that came from.


10. (Conclusion) No data. Sections headed “Who Governs?” and “Political Obstacles to Economic Equality” could have been anticipated when I started the project; sections headed “Partisan Politics and the ‘Have-Nots’” and “The City of Utmost Necessity” could not (at least by me).


You should feel free to share any of this, if any of it seems helpful.


All the best,


Larry

{ 2 comments }

David Karger October 31, 2012 at 7:10 pm

I made a related proposal: that papers be submitted for (preliminary) review contain only the experimental design, and not the results. The idea is to leverage author laziness—they won’t have to do the study until the article is provisionally accepted.
http://groups.csail.mit.edu/haystack/blog/2011/02/17/a-proposal-for-increasing-evaluation-in-cs-research-publication/

Macartan Humphreys November 2, 2012 at 2:27 am

Larry: this is very interesting. I am part of the group discussing this proposal (tomorrow) so your comments are well timed. A few reactions to your post (apologies for length).

1. EXPLORATION.
A lot of what we do is exploratory, and has to be, and registration if done right should not prevent that. The problem as I see it in political science is not that we do a lot of exploratory work but that we don’t admit to it, either in the writeups or in the statistics. That muddies inference and it’s this muddying that registration is meant to address (for some striking evidence on the incidence of fishing either by researchers or journals in political science see Gerber and Malhorta 2008)).

As I see it if there were a recognized registry then a researcher wanting to do exploration in an area where priors or theory are very weak could :
A. Declare that they want to do soak and poke exploration (by which I mean exploration that might be valuable but is not itself amenable to ex ante description) and signal that that is what they are doing by not registering.
B. Do principled exploration and register the process used for discovery. Alternatively, split the data in two; do unregistered exploration in one set, then form hypotheses, register, and test in the second set. There are lots of other principled methods however which could be described prior to the realization of outcomes.
C. Declare that really they are interested in the estimation of various quantities, not tests of particular claims about quantities, and register that. I think that is very often what people are interested in anyway and if registration shifts people away from classical hypothesis testing that might be a good thing.
D. Register some weakly motivated hypotheses because you feel you have to register something concrete; not listen to what the data is trying to tell you when you see things working very differently.
In cases where exploration is needed I think option D is the worst; but options A-C are all on the table and all have benefits I think over the current approach.

2 REGISTRATION AS A COMMUNICATION DEVICE NOT A COMMITMENT DEVICE
From the above I think value from registration is transparency: knowing what procedures were used to develop conclusions and whether or not conclusions are being reported as a function of findings. If that’s what we are after then the critical thing is communication not constraint. For this reason the proposal we are discussing suggests a non-mandatory and non-binding approach; at least until we know more about what this would do to the field. People can opt out entirely. Or they could register what they plan ex ante, but still do analysis on things that they didn’t plan but come to them after seeing the data and results; registration can help tell those two different parts of analysis apart. That would allow a process much like what you describe in your work. There are now a handful of studies in political economy that have registered first and I think in every case researchers have deviated in some way from the original plan, and were right to do so; they maintain transparency by indicating when and why they deviate from plan.

3 PROSPECTIVE DATA, HISTORICAL DATA, AND RESEARCHER HONESTY
I am stumped on the merits of registering analysis of historical data. My concern is not that with retrospective data people can be more dishonest. It is possible with historical data that people would fish and then register, but I expect that much of the problem we have with fishing is not that people deliberately cheat but that we have norms of letting the data speak to us and presenting it like a test. I think it’s also possible to fish without being conscious of it: you have two measures for something and one “works”, in the sense of producing significant results, and the other doesn’t. You use the one that works on the (possibly correct) assumption that it is a better measure. But then your criterion for accepting the test has been influenced by the conclusions of the test. The problem with retrospective data is more subtle because with retrospective data people don’t actually know when something has been fished *even before they start analysis*. If I run a regression of democracy on wealth using historical data I do it already having seen hundreds of past analyses, quantitative and qualitative of the exact same events. Even if my measure is new, in some way it is still reflecting the same data that gave rise to my hypothesis. That’s obviously a large problem (at least if you are not a Bayesian) which is that goes beyond registration. Neither of these problems—of intentional or unintentional fishing—imply that there is no point in registering; one could still register analyses using historical data and describe it as such.

Comments on this entry are closed.

Previous post:

Next post: