Causal inference, extrapolating from sample to population

In a new paper titled “Does Regression Produce Representative Estimates of Causal Effects?”, Peter Aronow and Cyrus Samii write:

It is well-known that, with an unrepresentative sample, the estimate of a causal effect may fail to characterize how effects operate in the population of interest. What is less well understood is that conventional estimation practices for observational studies may produce the same problem even with a representative sample. Specifically, causal effects estimated via multiple regression differentially weight each unit’s contribution. The “effective sample” that regression uses to generate the causal effect estimate may bear little resemblance to the population of interest. The effects that multiple regression estimate may be nonrepresentative in a similar manner as are effects produced via quasi-experimental methods such as instrumental variables, matching, or regression discontinuity designs, implying there is no general external validity basis for preferring multiple regression on representative samples over quasi-experimental methods. We show how to estimate the implied multiple-regression weights for each unit, thus allowing researchers to visualize the characteristics of the effective sample. We then discuss alternative approaches that, under certain conditions, recover representative average causal effects. The requisite conditions cannot always be met.

They work within a poststratification-like framework, which I like, and I agree with their message. Here’s what I wrote on the topic a couple years ago:

It would be tempting to split the difference in the present debate [between proponents of field experiments and observational studies] and say something like the following: Randomized experiments give you accurate estimates of things you don’t care about; Observational studies give biased estimates of things that actually matter. The difficulty with this formulation is that inferences from observational studies also have to be extrapolated to correspond to the ultimate policy goals. Observational studies can be applied in many more settings than experiments but they address the same sort of specific micro-questions. . . . I recommend we learn some lessons from the experience of educational researchers, who have been running large experiments for decades and realize that, first, experiments give you a degree of confidence that you can rarely get from an observational analysis; and, second, that the mapping from any research finding—experimental or observational—is in effect an ongoing conversation among models, data, and analysis.

But that’s just words; Aronow and Samii back up their words with math, which is a good thing. I only have two minor comments on their paper:

1. Table 1 should be a graph. Use coefplot() or something like that. Do we really care that some variable has a mean of “47.58”?

2. I think the title is misleading in that it sets “regression” in opposition to designed experiments or natural experiments. Regression is typically the right tool to use when analyzing experimental or observational data. In either case, we are faced with the usual statistical problem of generalizing from sample to population.

7 Responses to Causal inference, extrapolating from sample to population

  1. Fernando July 4, 2013 at 1:18 pm #

    As I read along my first reaction is that I don’t find counterfactuals “extremely clear”, and find the language of correlation to be incredibly ambiguous for causal discourse. For example, is equation (1) structural, or a data summary? Potential outcomes, as far as I can tell, are assumed to be given and fixed.

    As for correlations the statement on pg 7-8 that “We allow for the control variables in X_i to be potentially correlated with both D_i and the baseline values given by Y_i (0). Thus, including X_i in our analysis is necessary to obtain a causal effect estimate untainted by omitted variable bias” is ambiguous at best, and possibly wrong, depending on the underlying causal structure giving rise to the correlation. With the caveat that is is not clear (to me) what they are saying, I can think of scenarios where controlling for X will in fact introduce bias.

    • Patrick July 4, 2013 at 2:12 pm #

      I’m not sure what is unclear about counterfactuals. Perhaps you can clarify? It’s clear that equation (1) is “structural” in the sense that every unit in the data and in the population follows that equation. In fact, the equation is just a simple definition of potential outcomes and a “causal effect”. I actually find non-parametric SEM and graphical models to be even more unclear. When one points an arrow toward a variable, what does that mean? Is it simply correlation? Does it represent a manipulation of some sort?

      • Fernando July 4, 2013 at 3:58 pm #

        Equation 1 suggests that if I manipulate (variable) t_i I change Y_i(d). Did I just have a causal effect on Y_i(d)? Think about it. (PS there is an interpretive difference between a structural equation and an identity)

        As for the interpretation of arrows, yes, they are causal relations as is explained in any introductions to DAGs. That is an essential feature.

        • Patrick July 4, 2013 at 10:42 pm #

          \tau_i is not a variable…it’s a constant…so I’m not sure how you can manipulate that. If you manipulated D_i, then you have a causal effect on Y.

        • Patrick July 4, 2013 at 10:56 pm #

          And also, it seems to me that DAGs are simply just trying to represent potential outcomes and counterfactuals with pictures rather than mathematical notation. For example, I’ve heard DAGs described from the point of view of switching a circuit on and off, which is essentially manipulation. The difference between on and off is the “effect” and the arrow just points to the “current” is running. So we have a manipulation, two potential outcomes, and counterfactuals. The difference is that the Rubin model writes it in the form of Y(d) and Y(d-1) whereas DAGs represent it with ->. The debate seems silly and both sides pretty much agree that they represent the same ideas. It’s just a matter of preference and which is more clear. I personally prefer the PO way because it’s all written out notationally, but to each his own.

  2. Fernando July 4, 2013 at 1:52 pm #

    I don’t think non-parametric SEM estimation suffers from this problem (if data are a random sample, so strata can be correctly weighted).

    The problem (or advantage) with regression is it brings implicit a loss function that is unnecessary for the task at hand.

    The question, it seems, is whether you prefer strata weights or covariance weights, or, put differently, whether you want a simple average or a (loss function) weighted one.

    Presumably if you face a squared error loss function you’d prefer regression.

    • Fernando July 4, 2013 at 2:00 pm #

      PS Perhaps this could go into a simulation. You get info from experiments carried out in one sample (“sample A”), then you are asked to place bets on the effects of the same experiment in a different sample (sample “B”). Your payoff is decreasing in the square of the errors.

      How should you learn from sample A to make predictions in sample B? In particular what estimator should you use to minimize (maximize) your losses (gains). We frame it thus in