A sociologist writes in:

Samuel Lucas has just published a paper in Quality and Quantity arguing that anything less than a full probability sample of higher levels in HLMs yields biased and unusable results. If I follow him correctly, he is arguing that not only are the SEs too small, but the parameter estimates themselves are biased and we cannot say in advance whether the bias is positive or negative.

Lucas has thrown down a big gauntlet, advising us throw away our data unless the sample of macro units is right and ignore the published results that fail this standard. Extreme.

Is there another conclusion to be drawn?

Other advice to be given?

A Bayesian path out of the valley?

The short answer is that I think Lucas is being unnecessarily alarmist. Heres’s the abstract to his paper:

The multilevel model has become a staple of social research. I textually and formally explicate sample design features that, I contend, are required for unbiased estimation of macro-level multilevel model parameters and the use of tools for statistical inference, such as standard errors. After detailing the limited and conflicting guidance on sample design in the multilevel model didactic literature, illustrative nationally-representative datasets and published examples that violate the posited requirements are identified. Because the didactic literature is either silent on sample design requirements or in disagreement with the constraints posited here, two Monte Carlo simulations are conducted to clarify the issues. The results indicate that bias follows use of samples that fail to satisfy the requirements outlined; notably, the bias is poorly-behaved, such that estimates provide neither upper nor lower bounds for the population parameter. Further, hypothesis tests are unjustified. Thus, published multilevel model analyses using many workhorse datasets, including NELS, AdHealth, NLSY, GSS, PSID, and SIPP, often unwittingly convey substantive results and theoretical conclusions that lack foundation. Future research using the multilevel model should be limited to cases that satisfy the sample requirements described.

And here’s my reaction:

To me, the appropriate analogy is to regression models. Just as we can fit single-level regressions to data that are not random samples, we can fit multilevel models to data that are not two-stage random samples. Ultimately we are interested in generalizing to a larger population, so if our data are *not* simple random samples, we need to account for this, a concern that I and others address using multilevel modeling and poststratification; see, for example, my recent paper with Yair.

But this is not a problem unique to multilevel models. In any study, there is a concern when generalizing from data to population. Hree’s what Lucas writes:

I contend, some datasets on which the MLM has been estimated are non-probability samples for the MLM. If so, estimators are biased and the tools of inferential statistics (e.g., standard errors) are inapplicable, dissolving the foundation for findings from such studies. Further, this circumstance may not be rare; the processes transforming probability samples into problematic samples for the MLM may be inconspicuous but widespread. If this reasoning is correct, many published MLM study findings are likely wrong and, in any case, cannot be evaluated, meaning that, to the extent our knowledge depends on that research, our knowledge is compromised.

You could replace “MLM” with “regression” in that paragraph and nothing would change. So, yes, I think Lucas is correct to be concerned about generalizing from sample to population (what Lucas calls “bias”); it’s a huge issue in psychology and medical studies performed on volunteers or unrepresentative samples. But I don’t wee anything specially problematic about multilevel models, especially if the researcher takes the next step and does poststratification (which is, essentially, regression adjustment) to correct for differences between sample and population. If the data are crap, it’ll be hard to trust anything that comes out of your analysis, but multilevel modeling won’t be making things any worse. On the contrary: multilevel analysis is a way to model bias and variation.

Multilevel modeling doesn’t solve all problems (see my paper from a few years ago, “Multilevel modeling: what it can and can’t do”), but I think it’s the right way to go when you’re concerned about generalizing to a population. So in that sense I strongly disagree with Lucas, who writes, “Future research using the multilevel model should be limited to cases that satisfy the sample requirements described.” Random samples are great, and I admire Lucas’s thoughtful skepticism, but when we want to analyze data that are *not* random samples, I think it’s better to face up to the statistical difficulties and model them directly rather than running away from the problem.