Quality Assurance and Crowd Access Optimization: Why does diversity matter?

Quality Assurance and Crowd Access Optimization:
Why does diversity matter?
Besmira Nushi
Adish Singla
Anja Gruenheid
Andreas Krause
Donald Kossmann
ETH Zurich, Universit¨
atstrasse 6, 8092 Zurich, Switzerland
nushib@inf.ethz.ch
adish.singla@inf.ethz.ch
anja.gruenheid@inf.ethz.ch
krausea@inf.ethz.ch
donaldk@inf.ethz.ch
Abstract
lenges and is applicable in a wide range of use cases.
Quality assurance is amongst the most important challenges in crowdsourcing. Assigning tasks to several workers to increase quality can be expensive if no target-oriented
strategy is applied. Hence, efficient crowd access optimization methods are crucial to the
problem. This work argues that optimization
needs to be aware of diversity and correlation
of information within groups of individuals.
Based on this intuitive idea, we introduce a
novel crowd model that leverages the notion
of access paths as an alternative way of retrieving information. Moreover, we devise a
greedy optimization algorithm that works on
this model and finds a good approximate plan
to access the crowd.
In current crowdsourcing platforms, redundancy
(i.e. assigning the same task to multiple workers) is the
most common and straightforward way of confirming
results. Simple as it is, redundancy can be expensive if
used without any target-oriented approach, especially
if the errors of workers are correlated. Asking people
whose answers are expected to converge in the same
opinion is neither efficient nor insightful. For example, in a sentiment analysis task, one would prefer to
consider opinions from di↵erent non-related groups of
interests before forming a final interpretation. This
is the basis of the diversity principle introduced by
Surowiecki (Surowiecki, 2005). The principle states
that the best answers are achieved from discussions
and contradiction rather than agreement and consensus. The Access Path Model (APM) that we describe
here explores the crowd diversity not on the individual worker level but on the common bias of workers
while performing a task. In this context, an access
path is a way of retrieving a piece of information from
the crowd. The configuration of access paths may be
based on the source of information of the answer (e.g.
book, yellow pages, web page), workers’ demographics
(e.g. profession, group of interest, age) or task-specific
attributes (e.g. time of completion, task-design).
1. Introduction
Crowdsourcing is applied to integrate humans in collaboratively solving problems that are difficult to handle with machines only. This field of research has attracted the interest of many communities such as machine learning, database systems and human computer
interaction. Two very crucial challenges in crowdsourcing, irrespectively of the field of application, are
quality assurance and crowd access optimization. Both
are important for building strategies that can proactively plan and ensure the quality of the results deduced from crowdsourced data. In this work, we
propose a novel crowd model, named Access Path
Model (APM), which seamlessly tackles both chalICML Workshop on Crowdsourcing and Human Computing, Beijing, China, 2014.
Example 1 Jane is researching the impact of lifestyle
on the development of the Alzheimer disease. More
specifically, she wants to answer the question: “Can
physical exercise prevent Alzheimer’s disease?”. She
can ask three di↵erent groups of people:
Access Path
Neurologist
Personal trainer
Alzheimer patient
Error rate
Cost
10%
20%
25%
$20
$15
$10
Quality Assurance and Crowd Access Optimization
Each of the groups brings a di↵erent perspective to
the problem and has an associated error rate and cost.
Considering that Jane has a limited budget to spend
and that she can ask more than once on the same access path, she is interested in finding the optimal combination that will give her the best answer. Throughout this paper, a combination of access paths will be
referred to as an access plan and it tells how many
people to ask on each available access path.
Previous work in quality assurance and crowd access
optimization estimates the individual perfomance of
each worker and targets those with the best accuracy.
This scheme is useful for spam identification and pricing but unfortunately does not reveal coarse-grained
diversity of the crowd and may risk to fall into partial
consensus traps. For instance, in the previous example, spending the whole budget on doctors only is not
going to make use of the personal experiences of real
Alzheimer patients and training professionals. Moreover, the crowd participation is dynamic which makes
it difficult to accurately estimate the errors of individuals. For example, a single worker might not have
enough sample answers to evaluate his or her skills.
Additionally, in a free and competitive marketplace,
the vote of a particular person is never guaranteed.
The model that we propose overcomes these difficulties
by planning the optimization on groups rather than individuals.
In summary, this work makes the following contributions:
• Modelling the crowd. We design the Access
Path Model as an acyclic Bayesian Network where
each of the access paths is represented as a latent
random variable. To the best of our knowledge,
APM is the first model able to capture and utilize
the crowd diversity from a non-individual viewpoint. We show that such a model is present in
real crowdsourcing settings and that it’s results
are more qualitative than relying only on error
estimates of separate workers or simple majority
votes.
• Crowd access optimization. We devise a
greedy algorithm for the crowd access optimization problem. The algorithm leverages the Access Path Model and produces non-adaptive access plans by using information gain as an objective function for reducing the uncertainty.
We compare our model and optimization technique
with Na¨ıve Bayes approaches and Majority Vote. Our
experiments cover tasks from two di↵erent domains:
sport events prediction and species classification.
2. Related Work
The reliability of crowdsourcing and relevant optimization techniques are longstanding issues for human computation platforms. We identify the following directions as the ones that are closer to our study:
Query optimization. Crowdsourced databases extend the functionalities of a conventional database system to support crowd-like information sources. Quality assurance and crowd access optimization are envisioned as part of the query optimizer which in this
special case needs to estimate the query plans not only
according to the cost but also to their accuracy and latency. Most of the previous work in this area (Franklin
et al., 2011; Marcus et al., 2011; Parameswaran et al.,
2012) focuses on building declarative query languages
with particular support for processing crowdsourced
data. The proposed optimizers here take care of (1)
defining the order of execution of operators within
query plans and (2) mapping the crowdsourcable operators to micro-tasks, while the quality of the results
is then ensured only by requiring a minimum amount
of responses for the same micro-task. In our work, we
propose a more fine grained approach by first ensuring
the quality of each single database operator executed
by the crowd.
Access path selection. Even though the idea of access paths is one of the basic pillars in query optimization for traditional databases (Selinger et al., 1979),
in crowdsourced databases this abstraction is not fully
explored. One of the few studies which investigates
this is presented in Deco (Parameswaran et al., 2012).
Deco uses the concept of a fetch rule to define the
way how the data can be obtained either from humans
or other external sources. In this regard, our access
path concept is analoguous to a fetch rule with a very
important distinction that an access path is associated
with extra knowledge such as error rate and cost which
helps the database optimizer to use them for quality
assurance purposes.
Quality assurance and control. One of the central
works in this field is presented by David and Skene
in (Dawid & Skene, 1979). In an experimental design
where observers can make errors, these authors suggest to use the Expectation Maximization algorithm
(Moon, 1996) to obtain maximum likelihood estimates
for the observer variation. This has served as a foundation for several following contributions (Wang &
Ipeirotis, 2013; Liu et al., 2012; Whitehill et al., 2009)
which put the algorithm of David and Skene in the
context of crowdsourcing and enrich it for building
performance-sensitive pricing schemes. (Zhou et al.,
2012) has used minimax entropy principle for label
Quality Assurance and Crowd Access Optimization
aggregation from crowds. The main subject of these
studies is the crowd workers, while in our quality definition, the error rates of the workers are also a↵ected
by the access path that they follow. A work that follows a similar line and targets the tasks not to specific
workers but to groups is introduced in (Li et al., 2014).
One subtle di↵erence between this method and ours is
that our optimization technique does not immediately
discard access paths which do not prove to be the best
ones. Instead, for the sake of information diversity
as well as optimal planning, access plans may contain
more than one access path.
Crowd access optimization. The problem of finding the best plan to access the crowd is similar to the
problem of expert selection in decision-making. Nevertheless, di↵erently from the expert selection case, for
crowd access optimization, the assumption that the selected individuals will answer no longer holds even in
paid forms of crowdsourcing. Some previous studies
based on this assumption are (Karger et al., 2011; Ho
et al., 2013). The proposed techniques are e↵ective
for task recommendation, spam detection, and performance evaluation of workers, but they can easily run
into situations of low participation and consequently
cannot guarantee quality. Instead, the optimization algorithm that we devise chooses the workers according
to the access paths and is less prone to low participation. Relevant works in the management science domain (Lamberson & Page, 2012; Hong & Page, 2004)
define the notion of types to refer to forecasters that
have similar accuracies and high error correlation.
Crowd access strategies can run either in adaptive or
non-adaptive mode. In the adaptive mode (Ho et al.,
2013) the optimization is performed after each step
of crowdsourcing and the decisions adapt to the latest retrieved samples. The non-adaptive mode (Chen
& Krause, 2013) produces global plans which do not
change with new crowd evidence. Although these
strategies are static compared to the adaptive ones,
they allow for a higher degree of parallelization.
3. Problem Statement
In a traditional query optimizer, access paths to a relational table may have di↵erent execution times but
they are equivalent in terms of output. Also, the incoming data does not include any kind of uncertainty.
In contrast, a crowdsourced database has to deal with
uncertain information coming from noisy observations.
Thus, an access path Zi is associated not only to the
respective monetary cost but also to an error rate.
Moreover, the observations coming from di↵erent access paths have to be aggregated into a single decision
which is not required in a traditional RDBMS.
Being aware of these subtle di↵erences, we define the
problems that we are aiming to solve with this work
as follows.
Problem 1 Given a task Y and a set of votes collected from crowd workers, what is a good model that
can express diversity and compute high-quality predictions?
The model that we are looking for should be able to
abstract the common bias that comes with the access
path usage. The main assumptions to be represented
are (1) the error correlation within access paths and (2)
the independence of error accross access paths. These
assumptions mimic situations when groups of people
make similar decisions because they read the same media, follow the same lecture, have a common cultural
background etc. Furthermore, the model should o↵er
a decision function whose predictions are not only accurate but also linked to meaningful confidence levels.
For example, Na¨ıve Bayes models may o↵er reasonable
accuracy but the predictions are always highly confident and consequently difficult to interpret.
Problem 2 Given a task Y that can be solved following N di↵erent access paths Z = Z1 , . . . , ZN and a
budget constraint B, which is the best access plan P
that ensures the most qualitative decision with respect
to accuracy and diversity.
An access plan tells how many people to ask in each of
the access paths. In Example 1 the plan P1 = [1, 2, 3]
will ask one neurologist doctor, two personal trainers
and three patients. Similarly to access paths, each plan
is also associated to a cost c(P ) and quality q(P ). For
P3
example, c(P1 ) =
i=1 P1 [i] ⇤ ci = $80 where ci is
the cost of getting one single answer through access
path Zi . The definition of the quality of an access
plan is also the objective function to be used in the
optimization scheme. As we argue later, the choice of
this function is crucial to the solution.
Another interesting optimization problem to tackle
but beyong the scope of this paper is the following.
Problem 3 Given a task Y that can be solved following N di↵erent access paths Z = Z1 , . . . , ZN and a
targeted quality Q, which is the less expensive plan P
that satisfies the quality constraint?
Besides these two base problems, a related research
question concerns the discovery and the design of access paths if no intuitive configuration is available.
Possible tools that can help in this regard include
structural learning based on conditional independence
tests and information gain (De Campos, 2006).
Quality Assurance and Crowd Access Optimization
Y
Z1
0
1
✓1
✓ = @ ✓2 A
✓3
Z2
Y
1
0
P (Y = 1)
P (Y = 0)
2.1
Zi
Z3
Y
1
0
0
1
P (Zi = 1|Y = 0)
P (Zi = 1|Y = 1)
1
1
P (Zi = 1|Y = 0)
P (Zi = 1|Y = 1)
2.2
Xij
...
X11
...
X1P [1] X21
...
X2P [2] X31
X3P [3]
Figure 1: Bayes Network Model for Access Paths - APM
4. Access Path Model
The crowd model presented in this section aims to
fulfill the requirements specified in the definition of
Problem 1 and also enables our method to learn the
error rates from historical data and then aggregate
worker votes. We design the triple <task, access path,
worker> as an acyclic hierarchical Bayesian Network.
Figure 1 shows an instantiation of the APM model for
three access paths. The task Y in the root of the model
represents the random variable for the real outcome of
the task. The second layer of the network contains the
random variables for the access paths Z1 , Z2 , Z3 . Each
access path is represented as a latent variable since its
values are not observable. They are considered as a
channeling mechanism for the bias of workers that follow the access path. Due to the tail-to-tail network
configuration, each pair of access paths is conditionally independent on Y . As we mention in the baseline
description, it is possible to have a simpler model that
groups the workers according to the access path and
does not include the middle latent layer. This variant,
although it is able to make more accurate predictions
compared to other baselines it cannot make predictions
with meaningful confidence. Finally, the last layer contains the random variables for the votes of the workers
grouped by the access path they are following.
APM can handle two types of use cases:
1. Workers solve the whole task. The workers’
votes are guesses at the true answer and belong
to the same domain as the task. This is the case
of Example 1.
2. Workers solve subtasks. Often, complicated
tasks are decomposed into smaller ones. Each
subtask type can serve as an access path. Each
subtask type can serve as an access path and bring
its own signal in the model.
Zi
1
0
0
1
P (Xij = 1|Zi = 0)
P (Xij = 1|Zi = 1)
1
1
P (Xij = 1|Zi = 0)
P (Xij = 1|Zi = 1)
2.3
Figure 2: Parameters ✓ of the Access Path Model
4.1. Parameter learning
The main prerequisite for applying the Access Path
Model is that the task should be repetitive such that
the model can adjust its own parameters, i.e. the conditional probability of each variable with respect to
its parents. We will refer to the set of all model parameters as ✓. Figure 2 shows an example of ✓ for
a pure binary setting of the network. Given a training dataset D from historical data of the same type
of task, the goal of the parameter learning stage is to
find the maximum likelihood estimate for ✓MLE that
maximizes likelihood of the training set.
Definition 1 ✓MLE is a Maximum Likelihood Estimate for ✓ if ✓MLE = arg max✓ p(D|✓).
For a training set containing K samples:
p(D|✓) =
K
Y
k=1
p(sk |✓)
(1)
If all the variables in the model < Y, Z, X > were
observable then the likelihood of a sample sk training
set given ✓ would be:
p(sk |✓) = p(yk |✓)
N ⇣
Y
i=1
Pk [i]
p(zik |yk , ✓)
Y
j=1
p(xijk |zik , ✓)
⌘
(2)
where Pk [i] is the number of votes in access path Zi
for the sample. Since maximizing the likelihood estimation is equivalent to minimizing the negative log
likelihood, the problem in Definition 1 can be written
as:
K
X
✓MLE = arg min
log p(sk |✓)
(3)
✓
k=1
For this setting, the estimate for ✓Zi |Y can be computed by taking the derivative on both sides in order
to find the inflection point for the minimum:
K
@ log p(D|✓) X @ log p(zik |yk )
=
@✓Zi |Y
@✓Zi |Y
k=1
(4)
Quality Assurance and Crowd Access Optimization
For fully observable Zi the best estimate would be:
✓Zi =z|Y =y =
PK
(Zik = z, yk = y)
PK
k=1 (yk = y)
k=1
(5)
Here, () is an indicator function which returns 1 if the
training example fulfills the conditions of the function,
and 0 otherwise. Since in our model Zi is not observable, counting with the indicator function is not possible. For this purpose, we apply the Expectation Maximization algorithm (Moon, 1996). Below we show the
instantiation of the EM algorithm for our model.
E-step: Calculates the expected value for the log likelihood of the latent variables given the current ✓0 . For
a binary Zi , for each sample this would be:
p(zik = 1, yk , xk |✓0 )
p(zik = 1, yk , xk |✓0 ) + p(zik = 0, yk , xk |✓0 )
(6)
M-step: Recomputes ✓ by maximizing the expected
log likelihood found on the E-step. Di↵erently from
what is shown in Equation 5, the counter for the latent
variable is replaced by its expected value.
E[zik ] =
✓Zi =1|Y =y
PK
✓Xij =x|Zi =1
PK
k=1 (yk = y)E[zik ]
PK
k=1 (yk = y)
(xijk = x)E[Zik ]
PK
k=1 E[Zik ]
k=1
(7)
✓Xij =x|Zi =1
k=1
PP [i]
j=1
PK
(xijk =x)
E[Zik ]
P [i]
k=1
E[Zik ]
prediction = arg max p(yc |xt )
(10)
yc 2Y
p(yc |xt ) = P
p(yc , xt )
y2Y p(y, xt )
(11)
Since the test samples contain only the values for the
variables X, the joint probability between the candidate outcome and the test sample is performed by
marginalizing over all possible values of Zi . Due to the
conditional independence of the access paths given Y ,
it is possible to do this in polynomial time as follows:
p(y, xt ) = p(y)
N ⇣ X
Y
i=1
zi 2Zi
Pt [i]
p(zi |y)
Y
j=1
p(xijt |zi )
⌘
(12)
Besides inferring the most likely outcome, we are also
interested in the confidence of the prediction. In other
words, we would also like know what is the likelihood
that the prediction is accurate. For our model (APM)
confidence corresponds to p(prediction|xt ) computed
as in Equation 11. As we will demonstrate in the experimental section, the model that we propose is able
to distinguish the confidence of the predictions in contrast to Na¨ıve Bayes variants whose predictions are always strongly confident. Confidence and accuracy can
be used together to define loss functions for evaluating
the performance of di↵erent models and baselines.
(8)
5. Crowd Access Optimization
Notice that Equation 8 models the situation when the
votes are always ordered by the id of the workers.
This scheme works if the set of workers involved in
the task is sufficiently stable to provide enough samples for computing the error rates of each worker (i.e.
✓Xij |Zi ). Since in many of the crowdsourcing applications (as well as in our experiments and datasets) this
is not always the case, we assign to the workers an
average value:
PK
the votes in the test sample xt .
(9)
This enables us to later apply on the model an optimization scheme agnostic with respect to the identity
of workers.
4.2. Inference
After learning the parameters, the model is used to
infer the answer of a task given the available votes on
each access path. The inference procedure computes
the likelihood of each candidate outcome yc 2 Y given
The crowd access optimization problem is crucial for
both paid and non-paid forms of crowdsourcing. While
in paid platforms it is clear that the goal is to acquire
the best quality for the given monetary budget, in nonpaid applications the necessity for optimization comes
from the fact that highly redundant accesses to the
crowd might decrease the user satisfaction and increase
the response latency. In this section, we describe how
to estimate the quality of possible plans and how to
finally choose the plan with the best expected quality.
5.1. Information Gain as a measure of quality
The concern of the crowd access optimization problem
is not related to accuracy only. The issue in real-world
crowdsourcing is that there is no perfect access path
and even the best ones get saturated early if the correlation within the access path is strong. As a result, the
quality specification should describe a plan not only in
terms of accuracy but also information gain and diversity. Based on this analysis, we are going to use the
information gain of the variable Y in our model for a
plan P as a measurement of plan quality as well as an
Quality Assurance and Crowd Access Optimization
objective function for our optimization scheme. Formally, this is defined as the joint information gain:
IG(Y ; P ) = H(Y )
H(Y |P )
(13)
P as an access plan determines how many X variables
to choose from each Zi access path. Since information
gain is based on the conditional entropy H(Y |P ), access paths that have a lower accuracy than the best
one might still be part of the optimal plan. This can
happen in two situations: (1) if better access paths
are relatively exhausted and asking one more question
in less accurate ones reduces the entropy more than
continuing to ask on paths that were previously explored (2) the very low accuracy of an access path can
improve the quality of a prediction if interpreted in
the opposite way. Similar metrics have been widely
used in the field of Bayesian experimental design aiming to optimally design experiments under uncertainty.
In targeted crowdsourcing the concept has been recently applied from (Li et al., 2014) and (Ipeirotis &
Gabrilovich, 2014).
The computation of the conditional entropy H(Y |P ) is
a #P-hard problem (Krause & Guestrin, 2012) and the
full calculation would require the enumeration of the
whole possible instantiations of the plan with votes.
Thus, we choose to follow the sampling approach presented in (Krause & Guestrin, 2012) which randomly
generates samples that satisfy the access plan and that
follow the parameters of the Bayesian Network. The
final conditional entropy will then be the average value
of the conditional entropies of the generated samples.
The method is proven to provide absolute error guarantees for certain levels of confidence if enough samples are generated. In addition, it in runs polynomial
time if the cost of sampling and the probabilistic inference from the network can also be done in polynomial
time. Both conditions are satisfied from our model
due to the hierarchical tree-like configuration of the
Bayesian Network. They also hold true for the Na¨ıve
Bayes baselines described in Section 6.2 as simpler tree
versions of our model.
5.2. Optimization scheme
After having determined the joint information gain as
an appropriate quality measure for a plan, the crowd
access optimization problem is to compute:
arg max IG(Y ; P ) s.t.
P 2P
N
X
i=1
ci ⇤ P [i]  B
(14)
where P is the set of all possible plans that satisfy
the budget constraint
B. An exhaustive search would
QN
consider |P| = i=1 cBi possible plans out of which the
Algorithm 1 Greedy Crowd Access Optimization
Input: budget B, bound ↵, step s
Output: best plan Pbest
b=0
while b < B do
Ubest = 0
for i = 1 to N do
Ppure = GetP ureP lan(s, Zi )
if ci  b and IsBound(Pbest [Ppure , ↵B) then
IG(Y, Pbest )
IG = IG(Y ; Pbest [ Ppure )
if cIG
>
U
then
best
i
Ubest = cIG
i
Pmax = Pbest [ Ppure
end if
end if
end for
Pbest = Pmax
b = cost(Pbest )
end while
return Pbest
ones that are not feasible have to be eleminated. Afterwards, for each of the feasible plans the one with the
maximum information gain has to be selected. Nevertheless, efficient approximation schemes can be constructed given the similarity of the problem with analogous maximization problems for submodular functions under budget constraints (Khuller et al., 1999).
Based on the non-decreasing properties of information
gain we devise a greedy technique as illustrated in Algorithm 1 that incrementally finds a local approximation for the best plan. In each step, the algorithm
evaluates the trade-o↵ U between marginal information gain and cost for all possible access paths feasible
to access. The marginal information gain is the improvement of information gain by adding to the current best plan s pure votes from one access path. In
our experiments we set s = 1 as it results to a better
approximation. Nevertheless, it is possible to spend
the budget in larger chunks for faster execution. In
cases when the number of available votes in each access path is bound by design, the algorithm stops asking anymore questions if the predefined bound ↵B is
reached.
In the worst case, when all access paths have unit
cost, the computational complexity of the algorithm
is O(↵B 2 N 2 S) where S is the number of samples generated to compute information gain. The cost of the
sampling process alone is O(BN S) while generating a
larger number of samples guarantees a better approximation rate.
Quality Assurance and Crowd Access Optimization
6. Experimental Evaluation
We experimentally evaluated our work on two realworld datasets, one for each use case described in Section 3. The main goal of the experiments is to validate
the proposed model and the optimization technique.
6.1. Dataset description
Both datasets that we describe here consist of real
votes gathered from people. For experiments with restricted budget, we repeat the learning and the prediction several times by randomly selecting from the
votes and via k-fold validation.
CUB-200 birds classification. The dataset is built
in the context of attribute-based classification of bird
images (Welinder et al., 2010). Since this is a difficult
task to carry out, the crowd workers are not directly
asked to find out the category of the bird but whether
a certain attribute is present in the image or not. Each
attribute brings a piece of information for the problem
and we treat them as access paths in our model (for
example, yellow beak ). The dataset contains 5-10 answers foreach of the 288 available attributes.
ProbabilitySports. This data is based on a crowdsourced betting competition (Probability Sports) on
NFL games. The participants in the competition voted
with a certain belief to the question:“Is the home team
going to win? ” for 250 events within the season. Not
all the participants voted on all the events and di↵erent seasons have a di↵erent popularity. We designed
the access paths based on the accuracy of each worker
during the season. Since the workers’ accuracy in the
dataset follows a normal distribution, we divide this
distribution into three intervals where each interval
corresponds to one access path (worse than average,
average, better than average). In this configuration,
the access paths will have a decreasing error rate. Consequently, for experimentation, we assigned them an
increasing integer cost (2, 3, 4), although the competition itself was originally not based on money.
6.2. Baseline models
For the purpose of our work we analyze di↵erent crowd
models with respect to diversity awareness and the
level of granularity for diversity. More specifically, we
will consider Majority Vote (MV), Na¨ıve Bayes
Individual (NBI) and Na¨ıve Bayes for Access
Paths (NBAP).
Majority Vote. Being the simplest of the models
and also the most popular one, majority voting is able
to produce fairly good results if the crowdsourcing re-
Y
...
X11
...
X1P [1]
X21
...
X2P [2]
X31
X3P [3]
✓1
✓2
✓3
Figure 5: Na¨ıve Bayes Model for Access Paths
dundancy is sufficient. Nevertheless, majority voting
considers each vote as equal with respect to quality
and does not have any sense of diversity.
Na¨ıve Bayes Individual. This model assigns individual error rates to each worker and uses them to
weigh the incoming votes and form a decision. This
means that the results highly depend on the assumption that each worker has solved fairly the same number of tasks and that each task has been solved by
the same number of workers. This assumption generally does not hold for open crowdsourcing markets
where the “vote not guaranteed ” circumstances are
commonly faced. As it is also shown in the experimental evaluation, this is harmful not only for estimating the error rates but also for crowd access optimization; the targeted workers might not participate
by wasting this way the budget or increasing the latency. Furthermore, even in cases of fully committed workers, this model does not provide the proper
logistics to optimize the budget distribution since it
does not capture the shared bias between the workers.
Na¨ıve Bayes for Access Paths. For correcting the
e↵ects of non-stable participation of individual workers we propose yet another baseline (Figure 5) very
similar to our original model. The votes of the workers here are still grouped according to the access path
but the access paths themselves are not represented
through the intermediate latent variables. For inference purposes then, each vote xij is weighed with the
average error rate ✓i of the access path it comes from.
This means that all the votes that belong to the same
access path behave as a single random variable. Note
that this generalization is obligatory for this model and
only optional for the Access Path model. Similarly to
NBI and to all Na¨ıve Bayes classifiers, this model is not
able to make predictions with meaningful confidence.
6.3. Model evaluation
For evaluating the Access Path Model regardless the
optimization process, we performed experiments first
using all the votes available in the datasets and
then equally distributing the budget accross all access paths. The comparison is based on two measures:
Quality Assurance and Crowd Access Optimization
APM
NBAP
MV
ODDS
NBI
Accuracy
Accuracy
0.7
0.6
APM
NBAP
NBI
MV
0.9
0.8
0.7
0.6
0.5
0.5
2000
2001
2002
2003
2004
2005
1
10
11
12
114
118
122
161
182
195
Year — ProbabilitySports
Species Id — CUB-200
Figure 3: Accuracy with unconstrained budget.
0.65
0.6
0.55
0.5
101
102
APM
NBAP
NBI
MV
103
-logLikelihood
Accuracy
600
APM
NBAP
400
NBI
MV
200
101
102
103
Budget — ProbabilitySports
Budget — ProbabilitySports
Figure 4: Accuracy and negative loglikelihood for equally distributed budget in ProbabilitySports (year 2002)
accuracy and negative loglikelihood. Accuracy corresponds to the percentage of correct predictions. Negative loglikelihood is computed as the sum over all test
samples of the negative loglikelihood that the prediction corresponds to the real outcome. The closer a
prediction is to the real outcome the lower is its negative loglikelihood.
X
logLikelihood =
log p(prediction = yt |xt )
st
(15)
Hence, the negative loglikelihood measures not only
the correctness of a model but also its ability to output
meaningful probabilities for a prediction to be correct.
Unconstrained budget. Figure 3 shows the accuracy of all models for both datasets by using all the
votes available. The aim of the experiment is to test
the robustness of APM in case of very high redundancy. For ProbabilitySports we also show the accuracy of the odds provided by the betting parties before the matches took place. As expected, for the betting scenario it is challenging to improve over Majority
Vote. Nevertheless, we notice a 4%-8% enhacement of
APM over majority while the Na¨ıve Bayes baselines
cannot achieve significant improvement.
For the birds classification dataset (CUB-200), it is
not possible to compare APM and NBAP with MV
and NBI because the votes of the workers do not solve
the final task as in the betting dataset. For this rea-
son, we performed further one vs. all experiments
on Mechanical Turk based on the same images as the
original dataset. Each hit consisted of a photo to be
classified from the worker as well as a good quality
sample photo from the species to be identified (the
latter was included to train the workers and simplify
their task). Each photo was assigned to 10 di↵erent
users. After this comparison, it can be observed that
access path models generally perform better than individual models and majority. In specific cases, when
the bird has a specific feature that makes it distinguishable from other species (e.g speciesId 12, yellow
headed blackbird) there is no di↵erence between the
models. Note that sometimes accuracy for NBI is very
close to MV because of the non-stable participation
of MTurk workers throughout all the photos from the
same species.
Constrained budget. For this experiment we varied
the total budget and equally distributed it accross all
access paths. Figure 4 shows that while the improvement of APM accuracy over NBI and MV is stable,
NBAP starts facing the overconfidence problem for
high values of available budget. The phenomenon is
better visible in the negative loglikelihood graph. Another expected observation is the improvement of majority in terms of negative likelihod and not in terms
of accuracy. This reflects the robustness of majority
to provide meaningful confidence levels even for highly
noisy data if enough votes are provided.
Quality Assurance and Crowd Access Optimization
·10 2
OPT
GA
AP2
AP3
AP1
No. accesses
IG
6
4
2
AP1(cost=2)
10
AP2(cost=3)
AP3(cost=4)
5
0
5
10
15
20
25
0
30
5
10
15
20
25
30
35
40
Budget — ProbabilitySports
Budget — ProbabilitySports
Figure 6: Information gain and budget spent accross access paths in the best approximate plan. (year=2002)
APM + GA
NBAP + GA
NBI + RND
MV + RND
NBI + GA
APM + GA
NBAP + GA
NBI + RND
MV + RND
NBI + GA
-logLikelihood
Accuracy
300
0.65
0.6
0.55
0.5
0
10
20
200
100
0
30
0
Budget — ProbabilitySports
10
20
30
Budget — ProbabilitySports
30
-logLikelihood
Accuracy
0.9
0.8
0.7
0.6
APM + GA
0.5
0
20
10
NBAP + GA
10
APM + GA
20
0
0
NBAP + GA
10
20
Budget — CUB-200
Budget — CUB-200
Figure 7: Greedy optimization results for ProbabilitySports(year=2002) and CUB-200(speciesId 118, spotted catbird)
6.4. Optimization scheme evaluation
In this set of experiments, we evaluate the efficiency
of the proposed greedy approximation scheme to accurately choose plans of high quality that take into account diversity. For a fair comparison, we adapted the
same scheme to the simpler baselines NBI and NBAP.
Greedy approximation. Figure 6 depicts the development of information gain with varying budget for
the optimal plan (OPT), the approximate plan computed by the greedy algorithm (GA) and three pure
plans which take votes only from a single access path.
The quality of GA approximation is very close to the
optimal plan. The third access path in ProbabilitySports (contains users with more than average relative
score for the season) reaches the highest information
gain compared to the others. Nevertheless, the quality of the plan is saturated for higher budget values
which encourages the optimization scheme to select
votes from other access paths as well. For the same
experiment, NBAP model with the same optimization
strategy chooses votes only from the third access path.
Crowd access optimization. Finally, we combine
the model and the optimization techniques to evaluate the impact of both. Figure 7 shows results for
both datasets. In ProbabilitySports, APM and NBAP
improve over MV and NBI with respect to accuracy
and negative loglikelihood. Since the plans for NBI
target concrete users in the competition, the accuracy
for budget values less than 10 is low because not all
the targeted users voted for all the events. Thus, we
also present the performance of NBI with random access of votes (NBI+RND). Also, in this dataset and
configuration the mixed plans do not o↵er clear improvement in terms of accuracy but only in terms of
negative loglikelihood. This happens because the access paths here are inherently designed based on the
accuracy of workers. In contrast, for CUB-200 where
the division of access paths is based on attributes, the
discrepancy between NBAP and APM is higher.
Quality Assurance and Crowd Access Optimization
7. Conclusion
In this work, we introduced a new approach for representing crowd diversity named as Access Path Model.
We showed that this model can be used to seamlessly
handle critical problems in crowdsourcing such as quality assurance and crowd access optimization. Experimental evaluation on real-world datasets demonstrated that leveraging APM along with greedy approximation schemes can improve the quality of results
compared to individual models and majority vote. As
future work, we plan to investigate on the problem
of automatically discovering and filtering access paths
when no intuitive configuration is available.
References
Krause, Andreas and Guestrin, Carlos E. Nearoptimal nonmyopic value of information in graphical
models. arXiv preprint arXiv:1207.1394, 2012.
Lamberson, PJ and Page, Scott E. Optimal forecasting
groups. Management Science, 58(4):805–810, 2012.
Li, Hongwei, Zhao, Bo, and Fuxman, Ariel. The wisdom of minority: Discovering and targeting the right
group of workers for crowdsourcing. In Proc. of the
23rd WWW, 2014.
Liu, Qiang, Peng, Jian, and Ihler, Alexander T. Variational inference for crowdsourcing. In NIPS, pp.
701–709, 2012.
Chen, Yuxin and Krause, Andreas. Near-optimal
batch mode active learning and adaptive submodular optimization. In Proc. of ICML, 2013.
Marcus, Adam, Wu, Eugene, Karger, David R, Madden, Samuel, and Miller, Robert C. Crowdsourced
databases: Query processing with people. In CIDR,
2011.
Dawid, Alexander Philip and Skene, Allan M. Maximum likelihood estimation of observer error-rates
using the em algorithm. Applied statistics, pp. 20–
28, 1979.
Moon, Todd K. The expectation-maximization algorithm. Signal processing magazine, IEEE, 13(6):47–
60, 1996.
De Campos, Luis M. A scoring function for learning bayesian networks based on mutual information
and conditional independence tests. JMLR, 7:2149–
2187, 2006.
Franklin, Michael J, Kossmann, Donald, Kraska, Tim,
Ramesh, Sukriti, and Xin, Reynold. Crowddb: answering queries with crowdsourcing. In Proceedings
of the 2011 ACM SIGMOD, pp. 61–72. ACM, 2011.
Ho, Chien-Ju, Jabbari, Shahin, and Vaughan, Jennifer W. Adaptive task assignment for crowdsourced
classification. In Proc. of ICML, pp. 534–542, 2013.
Hong, Lu and Page, Scott E. Groups of diverse
problem solvers can outperform groups of highability problem solvers. Proceedings of the National
Academy of Sciences of the United States of America, 101(46):16385–16389, 2004.
Ipeirotis, Panos and Gabrilovich, Evgeniy. Quizz: Targeted crowdsourcing with a billion (potential) users.
In WWW, 2014.
Karger, David R, Oh, Sewoong, and Shah, Devavrat.
Budget-optimal crowdsourcing using low-rank matrix approximations. In 49th Annual Allerton Conference, pp. 284–291. IEEE, 2011.
Khuller, Samir, Moss, Anna, and Naor, Joseph Seffi.
The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45, 1999.
Parameswaran, Aditya Ganesh, Park, Hyunjung,
Garcia-Molina, Hector, Polyzotis, Neoklis, and
Widom, Jennifer. Deco: declarative crowdsourcing.
In Proc. of CIKM, pp. 1203–1212, 2012.
Probability Sports. www.probabilitysports.com.
Selinger, P Griffiths, Astrahan, Morton M, Chamberlin, Donald D, Lorie, Raymond A, and Price,
Thomas G. Access path selection in a relational
database management system. In Proceedings of the
1979 ACM SIGMOD, pp. 23–34. ACM, 1979.
Surowiecki, James. The wisdom of crowds. Random
House LLC, 2005.
Wang, Jing and Ipeirotis, Panagiotis. Quality-based
pricing for crowdsourced workers. 2013.
Welinder, Peter, Branson, Steve, Mita, Takeshi, Wah,
Catherine, Schro↵, Florian, Belongie, Serge, and
Perona, Pietro. Caltech-ucsd birds 200. 2010.
Whitehill, Jacob, Ruvolo, Paul, Wu, Tingfan,
Bergsma, Jacob, and Movellan, Javier R. Whose
vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS,
volume 22, pp. 2035–2043, 2009.
Zhou, Dengyong, Platt, John C, Basu, Sumit, and
Mao, Yi. Learning from the wisdom of crowds by
minimax entropy. In NIPS, pp. 2204–2212, 2012.