In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question.

I’ll pose my question below, but first the graphs:

In a world of zero segregation of jobs by sex, the top graph above would have a spike at 50% (or, whatever the actual percentage is of women in the labor force) and, in the bottom graph, the pink and blue lines would be in the same place and would look like very steep S curves. The difference between the pink and blue lines represents segregation by job.

One thing I wonder is how these graphs would change if we redefine occupation. (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”?) Finer or coarser classification would give different results, and I wonder how this would work.

This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. I’m guessing that someone’s looked into this already and that there’s some research literature on the topic.

The ambiguity of how best to define occupational categories is reminiscent of the difficulty in defining markets for purposes of antitrust analysis, which is often a point of contention. But be grateful. Though the gender variable isn’t entirely binary, the complications of intersex individuals and pre-op versus post-op transsexuals are small enough in number not to have a major impact on the results. Would that racial and ethnic classifications were as easy.

Andrew, I don’t know what you mean when you say that the top graph “would have a spike at 50%” in a world without segregation. Two points: 1) this is a graph of the index of dissimilarity, so a value would equal zero in a world without segregation; 2) I’m not sure what you mean that the graph would have a spike at 50% (or zero for dissimilarity).

The bottom graph is a Lorenz curve, so I think that the curve would be a straight line at 45 degrees with in a world with no segregation (since the proportion of women in the workforce would equal the proportion of people in the workforce at every level of employment) not an S-curve.

In the world of racial residential segregation, smaller units of aggregation have higher levels of segregation.

Mike:

In a world with no occupational segregation in which A% of the labor force is female, the two curves on the bottom graph would coincide and they would be at y=0% for x less than A and at y=100% for x greater than A. For example, 0% of people (men or women) would work in occupations that are less than 9% female, and 100% of people (men or women) would work in occupations that are less than 75% female. Because if there were no occupational segregation, every occupation would be A% female.

Andrew, of course. I misread the graph’s x-axis to the be the cumulative percentage of women not the percentage of women. I now see exactly what you mean. Thank you for clarifying.

I’m still confused about what you meant for a spike at 50% on the top graph.

Mike:

I see what happened with the spike comment. I’d copied out the wrong graph. I’ll fix that.

With both the Gini and the index of dissimilarity, the more categories you use the higher your inequality score, in general. With the Gini this has to do with the calculus of the area under the Lorenz curve, I think. With gender segregation, it may have to do with the math as well, but it also fits the substantive pattern of gender distributions. For example, if you lump nurses and doctors into “professionals,” you see less segregation than if you separate them. And if you further break nurses and doctors down to pediatric nurses, nurse anesthetists, pediatricians and cardiologists, you would get a higher dissimilarity score still. To make comparisons over time or space, you have the further complication that the occupational composition shifts, and that the labor force size changes. No easy answers to how to do it right.