Using Amazon’s Mechanical Turk for Crowdsourcing Research

Poster on wall that says "Ask more questions"

Using Amazon’s Mechanical Turk for Crowdsourcing Research

July 20, 2017

Instructional Communication

If you’ve conducted a survey or developed a study that involved multiple participants, you’ve used “crowdsourcing” to collect data for your research. In a new Communication Monographs article, Kim Bartel Sheehan provides an overview of Amazon’s crowdsourcing platform, Mechanical Turk, as a means for Communication scholars to conduct academic research.

The Basics

What exactly is Mechanical Turk (hereafter referred to as MTurk)? Amazon created the platform in 2006 to connect “Internet users who are willing to accomplish small tasks for pay with companies and individuals that want to tap into the workforce.” It turns out that one-third of all available work on MTurk is in the academic realm, contributing to hundreds of published stories in social science, and lately, Communication and sociology.

Who can use MTurk? Potential workers must be at least 18 years old and provide authorization for a bank account. Researchers (known as Requesters) also register on the site and then post their research projects, indicating the number of workers they need, and other qualifications as necessary, such as location, some demographics, and eligibility requirements based on workers’ history on MTurk.

How does the payment system work? Payments for completed tasks are transferred from the Requester’s account to the worker’s account. Amazon charges fees on top of the payments, ranging from 20-40 percent of the task payment. Of note: Because academic surveys that require more than 10 respondents incur a 40 percent fee, Sheehan recommends breaking down larger surveys into “microbatches” of fewer than 10 respondents each. In addition to cutting down on the Amazon fees, this approach also allows researchers to launch surveys throughout the day, reducing data collection bias.

Participant Data Demographics

Amazon has not released data about the demographic composition of the MTurk worker pool, but she references studies and web analytics that estimate the size of the pool vary from as low as 5,050 active workers to up to half a million. Within the active worker pool, people report that they work at least two days a week, spending anywhere from fewer than five hours a week completing tasks to 21 hours a week or more.

Sheehan writes that there is “evidence that the demographic composition of MTurk workers is much more diverse than that of traditional student samples and can be fairly representative of larger populations.” About 80 percent of the MTurk workforce is in the United States, and just over one-half have college degrees. Three-quarters of workers are Caucasian, and the representation of men and women are about equal.

While student populations tend to participate in academic research to complete course requirements or earn extra credit, many MTurk workers take on tasks to supplement their incomes, with a full 25 percent of workers reporting that all or most of their income generates from MTurk. Sheehan suggests that because of this, “MTurk workers may take their participation in studies more seriously than student populations.”

Data Quality Matters

The author acknowledges that quality is a concern when collecting data online, but also notes that MTurk includes measures to ensure high-quality data, such as allowing to reject work or block workers from future work for turning in low-quality responses.

Additionally, a range of studies show that MTurk respondents tend to skew politically liberal, researchers collecting data on political issues can weight their sample by “oversampling from states that skew conservative, or screening for political attitudes upfront and ensuring that a balance of perspectives is achieved.”

Best Practices for Communication Researchers Using MTurk

Sheehan notes that she and her colleagues have used MTurk for a variety of Communication studies, on topics such as binge watching, environmental messaging, and corporate social responsibility. She suggests a few best practices for Communication scholars from her own experiences as well as from the literature on research methodologies:

Experience MTurk as a worker first; complete a few easy academic HITs to get a feel for the site.
Know and understand the rules about what types of HITs are acceptable to post on MTurk, and adhere to privacy regulations.
Design your study to ensure qualified respondents, either by requesting and paying for specific characteristics in respondents, or by implementing a short screening survey.
Use attention checks to ensure that respondents are real people (and not robots) and to make sure they are adhering to your survey guidelines and paying attention to the questions.
Optimize surveys to slow down respondents and collect more satisfactory and comprehensive responses to open-ended questions.
Pretest all surveys yourself to check functionality and estimated completion times, and adjust as necessary.
Engage with workers who contact you to answer questions they may have about the task, or to fix an issue they may have encountered with the survey – this will build trust between you and the workers.

This essay was translated from the scholarly article: Sheehan, Kim Bartel (2017). “Crowdsourcing research: Data Collection with Amazon’s Mechanical Turk.” Communication Monographs, doi: 10.1080/03637751.2017.1342043.

About the author (s)

Kim Sheehan

University of Oregon

Professor