New Evidence for Gender Bias in Recommendation Letters? A Closer Look

A recent _Journal of Applied Psychology_ article that purports to show the effect of gender bias in recommendation letter-writing on faculty hiring in psychology has been attracting a great deal of attention in the blog in the past few days (see for example “here”:http://www.insidehighered.com/news/2010/11/10/letters, “here”:http://www.marginalrevolution.com/marginalrevolution/2010/11/what-kind-of-letters-of-recommendation-are-written-for-women.html, “here”:http://www.psychologytoday.com/blog/what-makes-us-human/201011/would-you-hire-woman, or “here”:http://www.lemondrop.com/2010/11/12/your-colleague-says-youre-supportive-you-wont-get-hired/). “University of Michigan Professor William Clark”:http://polisci.lsa.umich.edu/faculty/wclark.html took a close look at the study and sent along this guest post. [NB: As the rest of the writing in this blog post is Clark’s, I am eschewing the use of the blocked quote formats.]

******

As the placement director for our graduate program, I read with particular interest an “article summary”:http://www.physorg.com/news/2010-11-letters-women-jobs.html that was recently shared by some of my facebook friends. The article ,”Gender and Letters of Recommendation for Academia: Agentic and Communal Differences,” _Journal of Applied Psychology_ 2009, 94(6):1591-1599, by Juan Madera, Michelle R. Heble, and Randi C. Martin) purports to show the effect of gender bias in recommendation letter-writing on faculty hiring in psychology.

The authors used a computer to analyze the text of hundreds of letters of recommendation for faculty positions in the psychology department at Rice university. In the first stage of their study they found that letter writers were more likely to use “communal” words when describing female applicants and “agentic” terms when describing male applicants. In the second stage of their study they suppressed the gender of the applicants and had a team of professors rates rates the candidates in terms of “hireability” after reading their letters. They hypothesized that applicants described in communal terms were deemed less hireable and applicants described in agentic terms would be more hireable. The combination of female applicants being more likely to be described in communal than agentic terms and applicants being judged as less desirable when described in communal than agentic terms would suggest that the biased way female candidates are described in letters of recommendation may be resulting in bias at the hiring stage.

As a social scientist I am typically skeptical of claims made in the literature and these claims seemed to be from a study that was actually clear enough to evaluate, so that is what I did. And here is what I found: There is evidence that applicants described in social terms are less hireable, but there is no evidence that women are more likely to be described in these terms than men are. There is evidence that women are less likely to be described in agentic terms than men, but there is no evidence this affects their hireability. In short, the bias in ways women are written about does not affect their hireability and the things said about candidates that influence their hireability are equally distributed between men and women. Thus, there is no evidence in this study of a causal chain from gender bias in the way applicants are described to gender bias in the way applicants are evaluated. Let me be clear – I am not making the claim that there is no gender bias in academic hiring or that letters of recommendation do not play a role. I am merely interested in whether the study in question demonstrates the existence of such bias.

The difference between the way I interpret the authors’ results from the way the authors do revolves around the fact that they “for exploratory reasons …included the gender of the letter writer” in interaction with the main variable of interest, the gender of the applicant. They then proceed to interpret the coefficients on the “gender of applicant” variable as if it were “a main effect.” That is an estimate of the unconditional effect of the gender of the applicant on a series of dependent variables of interest. The problem is that when you interact a variable of interest (X) with a potential moderator (Z) the reported coefficient on the variable of interest is the estimated effect of that variable when the moderator variable is equal to zero. To compound the problem, they have coded the moderator variable such that it equals 1 when the letter-writer is female and 2 when the letter-writer is male. So, that means that it is never the case that their moderator variable equals zero. So not only are the coefficients they are drawing inference not “main effects” they are not substantively meaningful conditional effects either. To draw inferences from their results we need to do a little algebra and calculate the estimated effect of the gender of the applicant on the various dependent variables at substantively meaningful values of the moderating variable (gender of the letter writer). This is easy to do and the results are listed in the accompanying table. Without access to the original data I can’t calculate the appropriate standard errors on the quantities in the first two columns of this table, but I can make some guesses based on the size of the standard errors that are reported. The statistical significance of the difference between these columns reported in column 3 is the same as the statistical significance on the interaction term reported in the paper, so that is reflected in column 3.

So, the table shows that the gender of the letter writer has no effect on the language used to describe the applicant in 5 out of 8 cases. There is no evidence that a larger number of communal adjectives are used to describe female applicants. And this is true whether the letter writer is male or female. Similarly, there is no greater propensity to describe female applicants in terms of a social/communal orientation. And this is true whether the letter writer is male or female. These prior statements are based on the fact that it is highly likely that the coefficients in the top two rows of this table are all statistically indistinguishable from zero. But if we were going to put statistical significance aside for the moment, a very different result emerges. The estimates the authors report suggest that women writers are less likely to describe men than women using communal terms and men are more likely to describe men than women in communal terms. So, if being branded as “communal” disadvantages applicants in the academic job markets (as the second part of the study purports to show), this disadvantage is being perpetrated on applicants of both genders by letter writers who share their gender. The bottom line is that the authors’ data suggest that either the gender of the applicant has no effect on the propensity to be described in communal terms or it has an effect that is very different from the one proposed by the authors – women write about women in ways that would hurt their hireability and men write about men in ways that hurt their hireability.

The results in the first stage of the study with respect to agentic terms comes closer to supporting the expectations of the study’s authors. Both male and female writers are more likely to use agentic adjectives when describing male applicants than when they are describing female applicants. There is no statistically significant difference between this propensity in male and female letter writers. In contrast, while male writers are more likely to use an agentic orientation while describing male applicants than when they are describing female applicants, there is no such propensity in female letter writers. In other words, we would need to imagine a theory that says “male letter writers should be more prone to this bias than women” or one that says “both genders are equally prone to this bias.” But its hard to imagine a theory that would embrace both of these simultaneously, which means either line three or line four of the above table constitutes disconfirming evidence. If the story is that letter writers are more likely to recognize agentic properties in men than they are in women, even when they are equally present in both (it seems to me something like this has to be the case if we are going to call the difference in language used to describe men and women “bias”) then its curious that only male letter writers are subject to this bias in terms of agentic orientation, but both male and female letter writers are subject to this bias in terms of agentic adjectives. In sum, there is some evidence that women are less likely to be described in agentic terms.

But the authors find, in stage two of their study that there no evidence of a statistically significant relationship between being describe in agentic terms and the applicants assessed hireability. Rather, there is a statistically significant relationship between being described in social terms and hireability, but the above analysis of the first stage project showed that there is no evidence that women are more likely to be described in social terms. Thus, the kind of bias found in letters is found to have no effect on hireability and the factors that influence hireability do not appear to be allocated across the genders in a biased way.

Of course, the absence of clear evidence of gender bias in this study does not mean that such bias does not exist. It just suggests that our belief in such bias should not be dramatically influenced by this study. Furthermore, the results of the study in question may not generalize outside of psychology – a field where women have made greater in roads than perhaps any other academic field. So, the situation may be as bad as we suspect in other fields. Similarly, this is not to say that there are not other sources of gender bias at work in academic hiring. The point here is to suggest that the results presented in the study in question are not as clear cut as they have been frequently interpreted to be. I suppose a further lesson is that trumpeting new scientific results after having read only a lay person’s summary of the original study is problematic.

Topics on this page

Facebook World Wide Web

Topics on this page

Related