This article is an excellent summary of all the frustrations I've encountered using both oDesk and Mechanical Turk.
I'll add one more issue with MTurk: binary decisions. Some of my tasks can't easily be broken down into accept/reject, or sometimes a worker gets something close but not all the way there. I want to be able to say "hey, you did pretty well, but please make these small changes and we'll accept it."
But that's not possible right now. You get the HIT and it's accepted or rejected. The result is that we have to sometimes reject workers who are making an honest effort but didn't sufficiently complete the task or we have to accept sub-par answers so as to not have to deal with worker blowback.
Let me suggest one solution for such cases: Use iterative tasks, in the spirit introduced by TurkIt.
For example, you want to create a caption for an image. You let a user create a caption. Then you take this caption and give it to another user, asking the user to improve it. Take the two versions and ask other workers, "which of the two versions is better?". Iterate until no improvement is possible.
Not a trivial setup, but gets around the binary accept/reject decisions problem and generates results of significantly superior quality.
Good idea. I've also thought about this for data entry/analysis type tasks. It is fully expected that a worker on any level won't be perfect, especially on these types of services. An approach I've considered is to have have 3 workers do the same task then compare results and take the majority answer which should be close to if not 100% accurate. Overtime you could even rate workers for accuracy and recruit/pay more if possible.
In addition to the iterative approach, you can also offer a base rate and accept all the tasks, even if they're not quite perfect, then pay a bonus to the workers who really nailed it.
That way you're not rejecting the workers who are making an honest effort, but the better performers are being paid accordingly.
To get any sort of work done you need to give good instructions to good workers, all within a good process. If any of these aren't in place it is hard to get quality results.
(Disclaimer: I am the founder of CloudFactory, a next generation mTurk working to get the right mix of solutions, process and workers.)
Panos has been using mTurk for years and knows how to break work down and give clear instructions to workers (plus validate to catch spammers and lazy workers). Most people need to fail a bunch of times in learning how to create tasks well before becoming good "factor owners" as Panos described it. We have a team of crowd solution developers that design workflows, create killer task forms, analyze data and improve everything so others don't have to go through the painful ramp up.
We have sent many thousands of tasks through mTurk while we build up our own workforce from scratch. Anonymous workers that sign up online in 2 minutes with fake info have no accountability or ownership in their work. This is just never going to work. You end up with anarchy in the marketplace, both workers and requesters get ripped off. CloudFactory is taking a totally different approach to building up our workforce to ensure that workers have accountability, are matched to the right tasks and in general are motivated and enjoy their work. Happy workers are good workers and we don't see any technology making up for this.
That said, a process is required and this is where technology is important. Quality control techniques are essential to catch mistakes even when the best workers are given clear instructions. We take a factory, mass production approach to this type of work and our platform offers tools to give real-time control and transparency to your work done in the cloud.
So when the screwdriver (mturk) isn't working properly, don't get a hammer (odesk) ... just get a better screwdriver!
If you're having trouble getting your HITs done or ensuring quality work on Turk, MobileWorks (YC S'11) is essentially a better version of Turk that takes care of these things for you. (Disclaimer: I'm an engineer at MobileWorks.) Your experiment with oDesk is interesting, but I can't help but feel that this sort of microtask work is not was oDesk was built for.
We're crowd researchers who spend our time working on methods for routing tasks to qualified workers and QAing work. We aren't a wrapper on top of Turk -- we're our own platform and we employ our own workers. You just push HITs to our API and we take care of quality control and worker management for you.
If you're interested, email me at lionel @ (my company) .com and I can help you get started.
Thanks for the offer. I have a decently long experience with MTurk to know how to get things done.
My point is that microtask work is not necessarily the optimal setting for tasks that are expected to last for longer periods of time. It is often beneficial to train and give people meaningful pieces of work instead of converting real work into micro-work and assume that workers are not intelligent enough to get things done properly.
How soon will it be possible to pick-off Mechanical Turk tasks using computers - assuming it is not being done already? Is this a real opportunity or is the big money going to be made in helping businesses design their work so it can be automated this way (assuming the problems don't degenerate into just another parallel programming task).
That level of linguistic understanding would be worth considerably more than whatever you could get from mechanical turking.
Perhaps easier would be mass-translation into some other language and then "Turk-Pimping" - using people with lower pay expectations to do tasks and taking a cut of whatever they earn.
All my work on Mechanical Turk is to feed the data into a machine learning process that automates the task. Some tasks are easy to automate (any classification task), some others are harder (anything related to content generation, or vision).
I have to say though that there is a pattern: Once you solve and automate a process, people want more, and push you into doing a task that cannot be easily automated. Then you try to automate it, and the cycle continues...
I think he means "complete", as in automate the tasks. "Pick off" is usually used as in: "The sniper was in the bell-tower, picking the enemy soldiers off one by one" (if you're a grammar geek, it's an optionally separable phrasal verb). It can also be used to refer to completing tasks.
I think Crowdflower (http://crowdflower.com/ ) would be useful in this situation. I've used them before because I'm UK based and MTurk wouldn't let me create HITS (not sure if that's still true?) but they also add some stuff to MTurk like a HIT designer (a form designer) and, more interestingly here, an interface to have people complete your HITS outside of MTurk.
What's a typical percentage rejection rate for your tasks?
I find that for some classification tasks my rejection rate is around a third (this includes careless/sloppy answers, so the rejection rate of honest answers is lower).
Even though there is always a right answer, some answers are almost correct, but they get rejected and the worker's stats are affected.
I'll add one more issue with MTurk: binary decisions. Some of my tasks can't easily be broken down into accept/reject, or sometimes a worker gets something close but not all the way there. I want to be able to say "hey, you did pretty well, but please make these small changes and we'll accept it."
But that's not possible right now. You get the HIT and it's accepted or rejected. The result is that we have to sometimes reject workers who are making an honest effort but didn't sufficiently complete the task or we have to accept sub-par answers so as to not have to deal with worker blowback.