How many turkers are there?

With the recent interest in getting multiple turkers into a virtual room at the same time, it would be nice to know how many individual turkers are out there. We don’t have the answer, but we have run lots of experiments, and we can aggregate all of our historical data (much of which is posted on this blog) to get a sense for how many turkers might be out there, and how active they are.

These graphs are based on 4,449 HITs, with a total of 28,168 assignments. Most of these were posted over a 75-day period. Here is the number of assignments completed each day of that span:

Assignments completed each day over a 75-day period.

Most of our assignments were done on single days, but there is some spread in the data.

We had work done by 1,496 individual turkers. Here is the number of assignments completed by each turker:

Assignments completed by each turker.This is a classic Power Law distribution. We even see something close to the 80-20 rule: 80% of assignments are completed by the top 22% of turkers. The last 25% of turkers completed 1 assignment each.

So, it looks like there may be relatively few active turkers out there doing most of the work. I have some anicdotal evidence that it is hard to get, say 500 separate turkers to complete a single HIT over a short time-span. Of course, this will change as MTurk grows, and it’s not even clear that this data gives a good picture of everything that happens on MTurk. We have posted a variety of types of HITs, but nothing compared to the variety of HITs that are out there.

It might be nice if someone ran a study soliciting as many individual turkers as possible to complete a single HIT, just to see how many there are at a given time — something simple, like here’s 50-cents to click a button.

There may also be some questions we can answer with the data we have, but haven’t thought to ask yet. Please don’t be shy about posting comments and suggestions.

Code:

Here is the Excel file used for the graphs above.

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

6 Comments »

 
  • John says:

    This is something I’ve often wondered about.

    Here are some ideas:

    1) There is some work on the problem of counting difficult to count populations like wildlife, the homeless etc. One method is to “catch” a bunch, mark them, release them and then catch another group and see how many repeats you have.

    Suppose you catch s_0 fish and paint their tails blue. Let N be the true number of fish, which is unobserved. Then you catch s_1 and find that b of the newly caught fish have blue tails. *If* you assume random sampling in both cases, then in expectation, s_0/N = b/s_1, so you can estimate N. Obviously turkers aren’t like fish and people who are likely to be caught first are likely to be re-caught, but maybe you could adjust for this somehow w/ lots of sampling and some modeling of turking intensity.

    2) With some pretty strong assumptions, you could probably use the turk monitor data that Panos is collecting to get a ballpark on how many workers there have to be to make the work disappear at the observed rate.

    3) I like the “show me you exist 50 cent HIT” but this would be expensive.
    What about creating an impossible HIT that’s likely to get a lot views anyway – something like:
    “Submit a valid proof that P!=NP” with a reward of $1000.
    Now, obviously you will get no submissions, but I’m pretty sure you could capture page views, esp. if it was an external HIT. Not everyone will look, which is a problem, but this combined w/ some other pieces of evidence might give us a ballpark estimate.

    4) Beg Amazon to just tell us? I don’t know why they have such a close-hold on this (and all) data. My personal take is that a little more transparency would undermine some of the fears raised by Jonathan Zittrain’s recent article (
    http://www.newsweek.com/id/225629).

    On that point, some of my own experimental work suggests that wages aren’t nearly as bad as people think, and my sense from casual inspection is that very few HITs are the kind of spammy/dishonest things people worry that AMT is being used for. Incidentally, that would be a good study – scrape all available HIT descriptions, then have turker’s rate how many seem devoted to questionable ends.

  • [...] prior post noted that about 20% of turkers do 80% of the work. In this case, 20% of turkers did 63% of the [...]

  • [...] said before that I had difficulty getting 500 people to do a task, but I had forgotten the details of whatever [...]

  • John says:

    Very Interesting, Thanks.

    Check this source.

  • Ewallet says:

    it’s pretty hard to really get accurate data for population..

  • lloyd says:

    this is lloyd! Hi! make it work plz.