How Representative Are Amazon Mechanical Turk Workers?

by John Sides on December 19, 2012 · 24 comments

in Methodology

This is a guest post by political scientist Sean Richey and Ben Taylor.

*****

On Election Day, we asked 565 Amazon Mechanical Turk (MTurk) workers to take a brief survey on vote choice, ideology and demographics.  MTurk has three million online workers who do computer tasks for a small amount of money.  For example, they could rate the likeability of an advertisement for a nickel.  Because MTurk is easy, cheap, and fast, it is becoming a popular subject pool for experimental research.   Our results cost $28 and came back to us in four hours from workers who live across the United States. (To sign up for free account, click here.)

Prior studies have found that MTurk workers respond similarly to representative samples of the United States on many political questions, and respond in similar ways to experimental stimuli (see this paper by Berinsky, Huber, and Lenz). These comparisons suggest that MTurk is a good sample for experiments, better than the college student samples that are commonplace.  Although Berinsky, Huber, and Lenz find that MTurk workers are not fully representative, the ease of MTurk and the similarity of experimental results using MTurk to previous results make MTurk very attractive.  More studies are needed, however, to investigate this resource.

We take a different approach here.  We compare MTurk workers on Election Day to actual election results and exit polling.  The survey paid $0.05 and had seven questions:  gender, age, education, income, state of residence, vote choice, and ideology.  Overall, 73% of these MTurk workers voted for Obama, 15% for Romney, and 12% for “Other.”  This is skewed in expected ways, matching the stereotypical image of online IT workers as liberal—or possibly libertarian since 12% voted for a third party in 2012, compared to 1.6% percent of all voters.


Voter turnout was also skewed.  Although turnout was around 60%, 86% of MTurk respondents reported voting.  Some over-reporting is common in surveys, but rarely as much as here.  Regardless of whether MTurk workers really do turn out more, or simply feel more pressure to say they did, they again differs substantially from the general population.

Ideology was also skewed.   Almost 61% of these MTurk workers identified as liberals, but exit polls showed only 25% of voters on Election Day did.


Finally, the demographics of MTurk workers are also unrepresentative.  Most importantly, the MTurk sample is heavily skewed toward younger people: 72% of the sample was 18 to 29 years old, compared to only 17% in the country as a whole.  Those 65 and older are only 0.2% of our sample (actually only two respondents), while nationally they were 13%.  Females are 50.8% of the United States, but only 34% of this sample.  MTurk workers also have lower incomes, on average.  About 57% made less than $50,000 each year (vs. 41% nationally), 31% made $50-100,000 (vs. 41%), and 11% made more than $100,000 (vs. 21%).

MTurk workers are also more highly educated.  Few have no high school degree (2% MTurk vs. 14% Nationally) or only a high school degree (11% vs. 28%). There are far more with some college education (48% vs. 29%) or a college degree (29% vs. 18%).

In sum, the MTurk sample is younger, more male, poorer, and more highly educated than Americans generally.  This matches the image of who you might think would be online doing computer tasks for a small amount of money.

The question of whether this sample is more diverse than a student sample depends on the school the students attend.  At my university—Georgia State University in Atlanta, GA—this sample would be less diverse than our student body in many respects (except state of residence, of course).  As the percentage of non-traditional students continues to rise, college student samples are becoming more diverse in terms of age, race and ethnicity, gender, income and so on.

We cannot speak to whether the experimental results derived from a representative sample could be replicated in this MTurk sample.  However, the skew in vote choice, ideology, and demographics should be taken seriously when assessing the external validity of research using MTurk samples. MTurk is an exciting new method to find subjects for research, but its value deserves further study and exploration.

{ 24 comments }

Ryan Enos December 19, 2012 at 12:12 pm

Interesting.

What did the MTurk advertisement say? Might believe that, on election day in an election they are losing, conservative workers have lost their appetite for taking a political survey. I’m not sure if the same effect would be seen in a exit poll because it might be harder to avoid an in-person request than an anonymous online one.

It would be useful to run the same survey again now – only $28 right?

PM December 19, 2012 at 1:09 pm

My response is much the same as Ryan’s, except I had a less charitable and more stereotypical thought: Maybe you CAN pay Democrats a nickel for their thoughts, but Republicans require more. After all, as the Team America guys say, freedom costs a buck oh five, so maybe you need at least $0.25 to get Republicans to respond.

Adam Hughes December 19, 2012 at 2:26 pm

In an MTurk survey experiment I conducted on October 4th, 2011 (N of 495), the partisan breakdown was 14% Republican, 13% independent leaning Republican, 28% independent leaning Democrat, and 45% Democrat. The ideology breakdown (7-pt scale) was, from extreme liberal to extreme conservative: 12%, 26%, 22%, 22%, 8%, 7%, and only 3% extreme conservative. 78% intended to vote. I paid respondents 50 cents each and the data were collected in 8 hours.

PM December 19, 2012 at 3:26 pm

Cool, I stand corrected

Adam Hughes December 19, 2012 at 2:27 pm

*that should say October 4th, 2012

Sean Richey December 20, 2012 at 9:56 am

Hi Adam, Thank you for mentioning this. I have also done MTurk experiments on different days than Election Day, and also found similar bias levels. Sean

Adam Berinsky December 19, 2012 at 3:42 pm

I don’t quite get the point of this study. We know that MT samples are different than the population. But just focusing on the differences is not particularly helpful. When running experiments, we want to think about how the composition of the sample affects our results; we don’t want to focus on the sample in isolation from out experiments.

More specifically, we want to think about how these differences might impact the results of our studies. That is why Greg, Gabe, and I ran several experiments and looked for heterogeneous effects. Focusing on the demographic and political composition of a sample in isolation from a particular experiment doesn’t really tell you much about the external validity of your results. For example if you are looking at a survey-wording effect that does not vary by the characteristics of your subjects, your results might be externally valid, even if your sample is skewed. It all depends on the phenomenon you are studying.

Druckman and Kam make this argument nicely in the recent Handbook of Experimental Political Science, but we also talk about this in our paper:

Concerns about the external validity of research conducted using student samples have been debated extensively (Sears 1986; Druckman and Kam 2011). For experimental research conducted using MTurk, two concerns raised about student samples are pertinent: (1) Whether estimated (average) treatment effects are accurate assessments of reatment effects for other samples and (2) whether these estimates are reliable assessments of treatment effects for the same sample outside theMTurk setting. The former concern is most likely to be a threat if treatment effects are heterogeneous and the composition of the MTurk sample is unrepresentative of the target population (see Druckman and Kam 2011). For example, if treatment effects are smaller for younger individuals than older ones, a sample dominated by younger individuals will yield estimated treatment effects smaller than what one would observe with a representative sample. The latter concern arises if people behave differently in the MTurk setting than they do outside of that setting.

Sean Richey December 20, 2012 at 10:07 am

Hi Adam, Thank you for these insightful comments, all of which we certainly agree with. One issue is that this is not really our study; we are doing an experiment on ballot initiatives using an MTurk sample. But we thought readers of this blog might be interested in a comparison of our Election Day data with the exit polling. So, we did not have in mind any great point or critique. MTurk offers speed, accessibility and pricing that are extremely attractive, especially to graduate students or those who do not have a lot of institutional support. But it is worth doing multiple studies of who the current workers are, as this can change or vary by how the ad is written or how much is paid and so on. Thanks again! Sean

Adam Berinsky December 20, 2012 at 2:10 pm

But you are making claims that we should question external validity based only on the demographics of your study.

you write:

“We cannot speak to whether the experimental results derived from a representative sample could be replicated in this MTurk sample. However, the skew in vote choice, ideology, and demographics should be taken seriously when assessing the external validity of research using MTurk samples. ”

This is the attitude I take issue with. My point (and thank of Jamie and Cindy) was that we shouldn’t fetishize sample properties. We need to think hard about whether our estimates carry beyond our sample and that thinking should be based on an argument about how effects might be different in our sample relative to other sample. Simply presenting the demographics and stating that we need to worry about external validity is not taking the issues of external validity seriously.

Turking Blog December 19, 2012 at 9:04 pm

Interesting blog. A couple of points, many workers are full time workers earning upwards of 5-700 per week on Mturk. You can go to turkernation.com or mturkforum.com to see some of these workers. I would imagine that you did not get a representitive sample becuase lowpaying tasks are not completed by quality workers. Generally it is part time and new workers who complete tasks that pay far below a fair wage.

WTaylor December 20, 2012 at 11:56 am

“many workers are full time workers earning upwards of 5-700 per week on Mturk. ”

I take issue with the word “many”. I have been a member of Turkernation for some time now and know many of the “regulars” there. 500-700 weeks are to be had, but certainly not on a regular basis for most regardless of experience level. Especially since many “good” requesters have left and none have replaced them.

I’m now interested in starting a poll at Turkernation to find this out specifically.

DWayne June 27, 2013 at 1:09 pm

I agree. I haven’t seen any data that verifies the number of workers on MTurk by earnings level, just approximations. I believe it would be almost impossible to determine a true “average”, since so many people try MTurk for a while, and then either drop out shortly thereafter (it’s too hard), or don’t do it consistently. That would skew the results toward the low end.

On the other hand, the number of people making a full time wage is likely to be fairly low as well, and probably requires a significant investment of time (both daily and longitudinally, to earn qualifications). There are only certain types of jobs that pay exceptionally well, writing and transcription lead the pack here.

I’m not just interested in weekly rates, I’d like to learn about hourly rates, over time. And not just the time workers are actually performing the HITS, but the time spent on logistics, research, and support: setting up accounts on Amazon and other sites (like the Forums, Crowdsource, etc., taking part in forums, looking up and corresponding with Requesters, reviewing the dashboard and checking payments, and scouring the database for good HITS.

Any good contractor factors all those extra hours into their rate. For services like Turkers provide, it would be reasonable for a contractor to charge 2-3X the expected net salary. I have to wonder if those full time Turkers really put in 80 or more hours’ worth of work to earn $500+/week (note that they are spending a lot of time on the Forums!) If so, it reduces that full-time $12.50/hour job to a $6.25 one. But the much larger number of typical Turkers are doing this because they need a flexible home-based job, and are more likely to be putting in a lot of unpaid time to earn $3-4/hour. I also dispute the idea that part-time Turkers are lower quality. Look again at the demographics; it’s an educated bunch. Many just need a supplemental income, and are devoting their work to earn a tiny “reward” (note the euphemism….)

I’d love to see research that accurately depicts all workers, hours (hours actually paid for, and hours required to perform the jobs), and salaries. I’d also welcome a study of dropout rates and reasons. America needs to seriously consider the implications of the formation of an underground workforce.

Mitch December 20, 2012 at 12:14 am

I think Turking Blog is the link between Adam and Sean/Ben. At the risk of sounding unscholarly, my gut tells me pretty strongly that modifying the incentives should modify the composition of the experimental sample. And modifying the economics of the sample guarantees you modify the political characteristics of it.

Maybe these party ID tests need to be run on different price points before this baby gets thrown up on SSRN.

Sean Richey December 20, 2012 at 9:52 am

Hi Mitch,
I agree with you that it will matter, but not at the small price differences that are typical at MTurk. If you paid $10 a task, it would change the composition of the potential workers, but very few pay that high, and most pay a nickel. So, I think that the commonly-used current payment levels have rendered the subject poll with these biases. Also, several other studies have found similar skew, at different prices points, such as this one:

http://blog.dansimons.com/2012/12/the-demographics-of-surveys-phone-vs.html

Thanks so much for your comments!
Sean

Ryan Enos December 20, 2012 at 9:27 am

Mitch, I think that’s a good and important point. My gut (and simple assumptions) tell me that any change the incentives for a survey is going to change the composition of the sample – I don’t know if the online survey houses have experience with this, but it would be interesting to know. On the other hand, Mturk might be different – I’ve always assumed that the large majority of turkers worked quite cheap. A test at a different price point would worthwhile.

Sean Richey December 20, 2012 at 9:44 am

Hi Ryan, Thank you for your comments. I have done a $.50 experiment on MTurk (ten times larger than the nickel we paid here) and I found a similar skew towards workers who were younger, more liberal or libertarian, higher education etc. So, at the typical prices paid at MTurk, I think these biases would remain. The default payment is a nickel, and I guess that is the standard payment. But, I agree with you that this may be an issue for who signs up to be a worker, because one 30 second task at a nickel equals around minimum wage per hour, so that could bias the pool.

I think is an amazing resource, which is particularly helpful for graduate students or those researchers who lack funding. I plan to keep using it, but we thought readers of this blog might be interested in these basic demographics and political attitudes. Thanks again, Sean

Diablery December 20, 2012 at 11:13 am

A nickle is not going to get you anyone who knows the Game. I am probably the least representative turker in that I’ve earned over 50 thousand dollars doing it over the past two years. If you want someone like me you have to pay a relatively dignified wage. Time your survey, do the math. Use a subject who isn’t familiar with the content, as that skews your result. If it comes out to something ridiculous you are not going to get a good sample. Instead you’ll get the most desperate and least accomplished turkers.

You high number of respondents who say that they are going to vote is actually easy to explain if you know something about how some survey takers respond to people who tell them that they do not intend to vote. They boot them from their survey, so people will say that they intend to vote because they believe that not doing this might result in not getting their shiny little nickle.

Sean Richey December 20, 2012 at 11:27 am

Hi Diablery, Nice to hear from a real Turker! I have often done nickel based HITS and requested only master workers, and have had no problem getting them done, so I am not sure I agree with you. By the way, quality work is different from the response bias issues that we are really talking about here. But your point about insincere responses due to the low amount of money being paid is well taken. Keep up your great work online! Sean

Turking Blog December 20, 2012 at 12:37 pm

I am going to have to agree with Diablery. I understand that payment was not the focus of your blog post, but it is a relevant issue when it comes to using Mechanical Turk.

Many academics fail to understand the true demographics of Mturk because they rely on flawed research from past experiments and what they see on the site itself.
Let me explain -
Doing a casual search for “surveys” or “research” will show many academics offering less than .10 per minute for their studies. The experienced turker will not touch this work, but to the uninformed requester, this appears to be the norm.
When the excellent survey requesters like University of Chicago or Yale CogLab post hits that pay .20 per minute, experienced turkers grab this work so it does not stay on the site very long.

The only way this work gets completed is because of a huge learning curve associated with Mturk and the constant influx of new workers. New workers are restricted to the number of HITs they can do per day, the qualifications that individual requesters have set for their hits, and the inexpierence of the casual requester/worker.
Experienced requesters and workers do not deal with this type of task. The requesters have real work that needs to be completed and the workers have a real need for money.
Do a search for CrowdSource. As you can see, there are no tasks available for workers without a qualification. The more experience you gain with CS, the more they pay you and the more hits open up.
A new worker will not know anything about CrowdSource or the higher paying requesters. Many requesters qualify a group of workers and close their qualification and never accept new workers. They have found and trained a group of workers to complete their tasks. New workers cannot work on those tasks and spend time searching and looking for any hits, (even extremely low paying ones) that they can work on.

My earnings for 2012 are at $15,200 with 85,000 HITs completed. I work part time on it (about 4 hours a day excluding weekends) and could have earned significantly more working full time. A rough estimate is around $14.50 an hour. I have experience in a variety of fields like social media, writing, categorization, marketing, Etc…

Not trying to minimize your need for relevant and quality results, but there are requesters who own for profit companies that have tasks that have to be completed. These tasks have to be completed correctly and in a timely manner. They pay a fair wage and receive quality results.
The workers who work on this type of hit are not new to Mturk. They don’t even bother with tasks like five cent surveys or requesters like LinkedIn/Oscar Smith.

I hope this helps you understand how and why low paying tasks are completed on Mturk. It is one of the main reasons why I blog for requesters, because many have long term projects that require quality results and experienced workers. The newer the requester, the harder it is to achieve this goal.

blake December 20, 2012 at 1:42 pm

I’m surprised no-one has brought up the issue of satisficing strategies among such respondents. I would think MTurkers would be great at rushing through a survey in a short a time as possible, perhaps paying little heed to which button they click. This is not such a problem for “easy”questions like Party ID and age, but might be when trying to measure political attitudes and preferences that require a bit of thought. I know of people who have had problems with some professional survey takers who find their way into online panels such as KN’s.

Adam Berinsky December 20, 2012 at 2:05 pm

Most academic I know set a requirement that participants have at least 95% of prior work accepted. I always do this and I think it does a decent job of weeding out the true click through folks. I’ve looked at satisficing in this group using “trick” questions (http://web.mit.edu/berinsky/www/berinsky_margolis_sances_screener_nyu.pdf) and the MT 95% passers pass my screener questions at upwards of 90% indicating that they are paying attention (and lots of requesters do this — I have stopped asking screener questions because the passage rates are so high and the workers say that they looked for the “trick” questions and couldn’t find them).

Erik Larson December 21, 2012 at 1:28 am

I’ve been running various MTurk surveys at $0.50 to $1 each for a few months, usually 20-30 hits at a time, and I have noticed that the demographics of Turkers seem to vary significantly depending on the time of day and day of the week. Specifically, I see a higher proportion of young men during the workday, which is then swamped by older women in the evenings and on weekends. From a pure gender perspective, I’ve seen consistent swings from 60/40 male during the day to 67/33 female in the evenings and on weekends. Not sure if others have seen this, or if it is just an idiosyncrasy of the research we are doing.

Sean Richey December 22, 2012 at 10:35 am

Hi Erik, That is very interesting, thank you. Very quick turnarounds on HITS could run into this problem, and it may be worth considering. I have run a larger HIT for seven days. which I believe is the default, and I suppose I could check out the break down by hours of submission. Sean

Prabhakaran June 11, 2013 at 1:55 am

Hello Friends When will mturk open new worker accounts from other countries

Comments on this entry are closed.

Previous post:

Next post: