Download Report

How to allocate review tasks for robust ranking∗
Dorit S. Hochbaum†
Asaf Levin‡
June 21, 2010
Abstract
In the process of reviewing and ranking projects by a group of reviewers, the allocation of the
subset of projects to each reviewer has major impact on the robustness of the outcome ranking. We
address here this problem where each reviewer is assigned, out of the list of all projects, a subset
of up to k projects. Each individual reviewer then ranks and compares all pairs of k projects. The
k-allocation problem is to determine an allocation of up to k projects to each reviewer, that lie
within the expertise set of the reviewer, so that the resulting union of reviewed projects has certain
desirable properties. The k-complete problem is a k-allocation with the property that all pairs
of projects have been compared by at least one reviewer. A k-complete allocation is desirable
as otherwise there may be projects that were not compared by any reviewer, leading to possible
adverse properties in the outcome ranking.
When a k-complete allocation cannot be achieved, one might settle for other properties. One
basic requirement is that each pair of projects is comparable via a ranking path which is a sequence
of pairwise rankings of projects implying a comparison of all pairs on the path. A k-allocation
with a ranking path between each pair is the connectivity-k-aloc. Since the robustness of relative
comparisons deteriorates with increased length of the ranking path, another goal is that between
each pair of projects there will be at least one ranking path that has at most two hops or q hops
for fixed values of q. An alternative means for increasing the robustness of the ranking is to use a
k-allocation with at least p disjoint ranking paths between each pair.
We model all these problems as graph problems. We demonstrate that the CONNECTIVITY-kALOC problem is polynomially solvable, using matroid intersection; we prove that the k-complete
problem is NP-hard unless k = 2; and we provide approximation algorithms for a related optimization problem. All other variants are shown to be NP-complete for all values of k ≥ 2.
Keywords:
Approximation algorithms, allocation problem, maximum coverage problem.
1 Introduction
The k-allocation problem arises in the context of a committee of reviewers that is to evaluate and rank
a set of projects. The task of evaluating all the projects is an excessive workload for a single reviewer.
Each reviewer has a maximum workload of, say, k projects that they can evaluate. Additionally, the
∗
An extended abstract version of this paper appeared with the title “The k-allocation problem and its variants” in Proceedings of the Fourth Workshop on Approximation and Online Algorithms (WAOA 2006)
†
Research supported in part by NSF award No. DMI-0620677 and CBET-0736232. Department of Industrial Engineering and Operations Research and Walter A. Haas School of Business, University of California, Berkeley. email:
hochbaum@ieor.berkeley.edu
‡
Chaya fellow. Faculty of Industrial Engineering and Management, The Technion, 32000 Haifa, Israel. email:
levinas@ie.technion.ac.il
1
2
reviewers can only evaluate those projects that fall within their area of expertise. The allocation of up
to k projects per reviewer within their expertise set is said to be a k-allocation.
An example of a scenario where such allocation takes place is for a National Science Foundation
panel that is to select the best few among the submitted proposals to fund. The director of the panel
usually focuses on the task of ensuring that each proposal is reviewed by at least a minimum number
of reviewers (say, at least 3) and that the allocation to each reviewer lies within their area of expertise.
One pitfall of the allocation is that in the review process it is possible, and indeed has happened
on occasions, that the two leading ranked projects are not comparable, as no reviewer reviewed both
of them. It is thus conceivable that there is a partition of the projects into two disjoint sets, and while
both these projects are best in their respective sets, one project can actually lie below the worst ranked
project in the set where the other one is best.
One way of addressing this pitfall is to allocate the projects so that each pair is also assigned to at
least one reviewer. This goal however is not always achievable for a given configuration of reviewers
and their expertise sets, as it adds to the total workload. We are then considering here this goal, as well
as alternative and weaker goals that still provide a form of comparison between any pair of projects,
albeit an indirect comparison.
In a general group decision making scenario, the major challenge is to come up with an aggregate
ranking that reflects the opinions of all reviewers and is fair and representative. There is a large
body of literature that addresses this challenge. This literature is reviewed in Kemeny and Snell [23],
Brans and Vincke [4], Bartholdi, Tovey and Trick [3], Keener [22], Fuller and Carlsson [14], and
Fernandez and Olemdo [12], and [19]. As explained above, the aggregate ranking may be affected by
the assignment of projects to individual reviewers. Still, the allocation of projects and its impact on
the quality of the resulting aggregate ranking is an aspect of group decision that is often overlooked.
Cook et al. [8] are the only researchers that have explicitly addressed this issue.
A convenient formalism for the problems and the associated models is as graph representation.
The input to the problem is an undirected complete graph G = (V, E) defined on the set of projects
V and all possible pairs (edges) E, an integer number k and a collection of node subsets representing
the expertise set of each of L reviewers, S1 , . . . , SL ⊆ V . In order to maintain a reasonable and
balanced workload, each reviewer is assigned at most k projects out of the set of possible projects,
Sj . So a feasible allocation consists, for each j ∈ {1, 2, . . . , L}, of a subset Vj ⊆ Sj such that
|Vj | ≤ min{|Sj |, k}. Each reviewer is able to make a direct comparison between each pair of projects
he/she reviews. The set of pairs compared by each reviewer forms a clique (a complete subgraph)
of size |Vj |, CVj (i.e., CVj is the edge set of a clique over the node set Vj ). For a given feasible
L
allocation, the set of covered projects is ∪L
j=1 Vj , and the set of compared project pairs is ∪j=1 CVj .
The properties of the graph of the edges covered by this union of cliques are closely related to the
quality of the ranking decision that can be achieved.
Let the review graph of covered projects and compared project pairs be GR = (V R , E R ) where
R
L
R
V R = ∪L
j=1 Vj and E = ∪j=1 CVj . The graph G is a multigraph – that is, there could be multiple
edges between some pairs of nodes (because more than one reviewer reviews this pair). A pair of
projects i, j ∈ V R is said to be directly compared if edge [i, j] ∈ E R . For a directly compared
pair there is input from at least one reviewer on the extent of preference of one project to the other.
This relative rank comparison is typically expressed in an additive form or in a multiplicative form.
A detailed discussion on intensity of preferences and the additive versus the multiplicative forms of
preferences is provided in [19]. We will use throughout the additive form in which pij expresses
by how much the rank of i exceeds the rank of j . So pji = −pij and the magnitude of pij is the
intensity of the preference of i to j. Each (undirected) edge [i, j] in the graph GR is formed of a pair
of (directed) arcs (i, j), (j, i) with the associated values pij and pji .
3
Although some pairs may not be directly compared by a given allocation, we can deduce a relative
ranking of two projects i and j if there exists a sequence of directly compared pairs: [i, i1 ], [i1 , i2 ], . . . ,
[ip−1 , j] ∈ E R . Such sequence corresponds to a path in the graph GR and the implied ranking of this
P
path is pij = p−1
q=0 piq ,iq+1 where i = i0 , j = ip . We call this path a ranking path of length p. A direct
comparison is then a ranking path of length 1. Since the process of evaluating projects and comparing
them is not accurate, an implied ranking by a long path may be impacted by cumulative errors in
the comparisons. This effect can be mitigated by the presence of multiple ranking paths between
given pairs or by having the ranking paths of bounded length. The presence of multiple ranking paths
between all pairs correspond to the increased edge, or node, connectivity of the review graph.
As an illustration of these concepts consider the graph GR in Figure 1. In this graph we take
k = 2, and therefore each reviewer reviews only a pair of projects. The endpoints of each edge form
the allocation to one reviewer. The intensities of the preferences are given as pij for i < j. In this
graph the pair of projects 1 and 2 is reviewed by two different reviewers. There are four implied
ranking paths between projects 1 and 5 of intensities 2, 4, .5 and −2. Among those the value .5 is the
intensity of a direct comparison.
Figure 1: An illustration review graph GR . The numbers along the edges are the intensity pij for
i < j.
Preliminaries and notations.
For a graph H we denote by nH and mH the number of nodes and edges, respectively, in H. For an
integer `, an `-subset is a subset of ` nodes.
A polynomial time algorithm A for a minimization problem (maximization problem) is a ρapproximation algorithm if it always returns a feasible solution whose objective value is at most (at
least) ρ times the optimum.
List of goals and paper outline.
The problem of allocating evaluation tasks to reviewers can be cast as the H-graph k-clique cover
problem defined as follows. Given a set V , L sets S1 , . . . , SL ⊆ V , and an integer k, find subsets
L
V1 , . . . , VL , with Vi ⊆ Si , and |Vi | ≤ k, so that the review graph GR = (∪L
i=1 Vi , ∪i=1 CVi ) has
property H. We are interested in the following properties H:
• GR is connected (or containing a spanning tree) – discussed in Section 2.
• GR is a complete graph – discussed in Section 3.
• GR is highly connected, where we consider several alternative definitions of this term. This
concept and related optimization problems are investigated in Section 4 and Section 5.
4
• GR has a path consisting of at most q hops between each pair of nodes – considered in Section
6.
Related Research. The Perron-Frobenius Theorem states the algebraic conditions that guarantee a
positive unique solution eigenvector r to the system Ar = λr. This subject is related to aggregate
ranking when the matrix A represents the pairwise comparisons between all pairs of objects. Further,
each column of the matrix can be viewed as the ranking provided by one reviewer. The eigenvector
r, if exists, is a principal eigenvector – corresponding to the largest eigenvalue. This theorem and the
generated principal eigenvector r have been used for decades to generate an aggregate ranking from
a matrix of rankings that can be viewed as provided by different reviewers. The condition for the
existence of such eigenvector is that the matrix A is irreducible. (For a statement of the theorem see
e.g. [22].) The irreducibility of the matrix is equivalent to the connectivity of GR .
2 The CONNECTIVITY-k-ALOC problem
A basic requirement is the ability to compare each pair of projects once the review process is done.
To that end we require that every pair of projects is comparable via a ranking path. In terms of the
review graph GR this goal is to find an allocation so that GR is connected. The CONNECTIVITY-kALOC problem is to find a feasible allocation so that GR is connected. In this section we consider the
CONNECTIVITY -k- ALOC problem, and show that it is polynomially solvable as we can recast it as an
instance of an intersection of two matroids.
Theorem 2.1 The CONNECTIVITY-k-ALOC problem is solvable in polynomial time O((|V | + L)3 ).
If the number of reviewers is fixed then the problem is solvable in O((|V | + L) log(|V | + L)) time.
Proof: We rephrase our problem as follows: Given a bipartite graph B = (P, R, E) with the two sets
of nodes P and R corresponding to the project set and the reviewer set, respectively, and the set of
edges E, with edge [p, r] ∈ E if and only if reviewer r can review project p (i.e., p belongs to the
expertise set of reviewer r). The goal is to find a spanning tree in B such that the degree of each node
in R is at most k (if such a spanning tree exists). We show that the CONNECTIVITY-k-ALOC problem
is equivalent to this last problem. Although in the CONNECTIVITY-k-ALOC problem we are actually
interested in a tree that is not necessarily spanning as not all reviewer nodes have to be included in
the tree, yet this tree must include the entire project set P within a common connected component.
W.l.o.g. we can connect each reviewer node to (at least one) project node and hence convert the tree
into a spanning tree. The degree bound restriction on the nodes r ∈ R is equivalent to the constraint
that r can review at most k projects. Hence, our problem is a minimum spanning tree in a bipartite
graph with degree bounds on the nodes of one side R of the bipartite graph.
This problem is polynomial time solvable by any polynomial time algorithm for intersection of
two matroids. The first matroid is the graphic matroid (and hence our solution will be a spanning
tree), and the second matroid is a partition matroid defined over E where a subset E 0 of the edges
is independent in the partition matroid if the degree of each node r ∈ R in the graph (P ∪ R, E 0 )
is at most k. We note that the independence testing oracle of each of the above matroids can be implemented in linear time (using Breadth-First-Search for the graphic matroid and simple counting for
the partition matroid). Therefore, any polynomial time (unweighted) matroid intersection algorithm
can be applied to solve the CONNECTIVITY-k-ALOC problem. We can use either the algorithm of [5]
or the algorithm of [6] for matroid intersection. The latter algorithm of Bresovac et al. 1986, [6], is
specialized for matroid intersection when one of the matroids is a partition matroid as is the case here.
The running time of that algorithm in our case is O(n3B ) where nB = |P | + |R|.
5
We note that if the number of reviewers is a fixed constant, then the degree bounds apply only to
a fixed number of nodes. Therefore, the more efficient algorithm of Frederickson and Srinivas 1989,
[13], for this matroid intersection problem can be applied. In this case the time complexity of the
resulting algorithm is O(nB log nB ). Hence, the claim follows.
3 The k-COMPLETE and the MAX k- COMPLETE COVERAGE problems
If an appropriate allocation exists, then it is desirable that all pairs of projects should be directly
comparable. Cook et al. [8] recently studied the problem of maximizing the number of directly
comparable projects pairs. The problem is therefore to determine the subsets Vj so that the union
S
of the edges in the complete graphs (or cliques) induced on Vj , CVj , | L
j=1 CVj |, is maximum. We
call this problem the MAX k- COMPLETE COVERAGE problem. The k- COMPLETE problem is the
problem of deciding if a given complete graph G has an optimal solution of the MAX k- COMPLETE
COVERAGE problem that equals the number of edges in G. So the MAX k- COMPLETE COVERAGE
problem is a more general problem than the k- COMPLETE problem, whereby the problem is defined
on an undirected graph G = (V, E) which is not necessarily complete. The goal is to select sets
S
Vj ⊆ Sj of size at most k, that maximize the number of edges, | L
j=1 CVj ∩ E|.
The k-COMPLETE problem and the MAX k- COMPLETE COVERAGE problem were recently studied
by Cook et al. [8] who gave integer programming formulations and a branch-and-bound (exponential
time) algorithm for solving these problems as well as a heuristic algorithm.
In this section we study the MAX k- COMPLETE COVERAGE problem and the k-COMPLETE problem. We first classify the complexity status of MAX k- COMPLETE COVERAGE problem as a function
of k showing that for k = 2 it is polynomially solvable whereas for all k ≥ 3 the k-COMPLETE
problem is NP-complete showing that the MAX k- COMPLETE COVERAGE problem is NP-hard for all
fixed values of k such that k ≥ 3. We then turn our attention into approximation algorithms.
3.1 Complexity classification
In this section we prove that the MAX 2- COMPLETE COVERAGE is polynomially solvable whereas the
k-COMPLETE for each fixed value of k (k ≥ 3) is NP-complete.
The k = 2 case:
Proposition 3.1 The MAX 2- COMPLETE COVERAGE problem is solvable in time
O(min{mG L1.5 , L2.5 / log(mG + L)}).
Proof: To solve the MAX 2- COMPLETE COVERAGE problem we construct the following bipartite
graph B 0 . The left side of the graph has L nodes corresponding to the sets Sj and the right side has
mG nodes corresponding to the edges of G. There is an edge in B 0 between Sj and e ∈ E if and only
if the set Sj contains both endpoints of the edge e. Figure 2 illustrates an example of such graph for
G a complete graph on 4 nodes, where the two reviewers’ expertise sets are {1, 2, 3} and {1, 2, 4}.
In this bipartite graph we find a maximum matching. For each edge [Sj , e] that belongs to the
matching we define the set Vj to be the pair of end-nodes of e. For each un-matched set Sj we define
Vj to be an arbitrary subset of Sj with two nodes.
Any feasible matching in B 0 corresponds to a MAX 2- COMPLETE COVERAGE solution with the
same objective value and vice versa. Therefore, the MAX 2- COMPLETE COVERAGE problem is solvable by the above procedure. The complexity
of the most efficient bipartite matching algorithm is
√
dominated by the expression O(min{mB M , M 2.5 / log(nB )}), [18], where M ≤ L is the size of
the maximum matching in B. Note that nB = L + mG and mB ≤ LmG .
6
[1, 3]
[2, 3]
{1, 2, 3}
[1, 2]
[3, 4]
{1, 2, 4}
[2, 4]
[1, 4]
Figure 2: The bipartite graph resulting for sets {1, 2, 3} and {1, 2, 4} for the
COVERAGE problem on a complete four node graph.
MAX
2- COMPLETE
The k ≥ 3 case: We now show that the k-COMPLETE problem is NP-hard for each fixed value
of k such that k ≥ 3. Our proof is based on a reduction from the following problem: An Hdecomposition of a graph G = (V, E) is a partition of E into subgraphs isomorphic to H. Given
a fixed graph H the H - DECOMPOSITION PROBLEM is to determine whether an input graph G admits
an H-decomposition. Holyer [20] proved that H-decomposition is NP-complete even for H that is a
complete graph over at least three nodes. Since then a stronger result was proved by Dor and Tarsi, [9],
demonstrating that if H is connected with at least three edges then H-decomposition is NP-complete.
Our proof of NP-hardness of k-COMPLETE is based on reduction from H-decomposition problem
for H that is a complete graph over k nodes.
Proposition 3.2 k-COMPLETE is NP-hard for all fixed values of k such that k ≥ 3.
Proof: We describe a reduction from H-decomposition where H is the complete graph over k nodes.
Let G = (V, E) be an input graph for the H-decomposition problem. If there is a H-decomposition of
¡k¢
the graph, then the number of H-isomorphic subgraphs in the H-decomposition is |E|
p where p = 2 .
Let E be the set of edges missing in G (to make it a complete graph). For each [u, v] ∈ E, let
k−2
1
2
v[u,v]
, v[u,v]
, . . . , v[u,v]
be a set of k − 2 new nodes (that do not belong to V ) corresponding to [u, v].
0
Let V be the set of new nodes for all the edges in E. The instance of the k-COMPLETE is defined as
follows: The graph is the complete graph over V ∪ V 0 . We define |E|
p sets S1 , S2 , . . . , S |E| each of
p
which equals V . These sets are called type 1 sets. We define another |E| sets each corresponding to
k−2
1
2
, v[u,v]
, . . . , v[u,v]
}. These
an edge in E. The set corresponding to [u, v] ∈ E is S[u,v] = {u, v, v[u,v]
are called type 2 sets.
A type 2 set has exactly k elements, and therefore w.l.o.g. there is an optimal solution for the
resulting k-COMPLETE instance with V[u,v] = S[u,v] . These type 2 sets span disjoint subsets of edges,
each of them contains exactly p edges.
By a counting argument the hvalue of ani optimal solution of the k-COMPLETE instance is p times
the number of subsets (i.e., p · |E|
p + |E| ) if and only if G has an H-decomposition. Therefore,
k-COMPLETE is NP-hard for each fixed value of k such that k ≥ 3.
7
Naturally, the proof of the above proposition demonstrates that the MAX k- COMPLETE COVER problem is NP-hard also when G is not restricted to be a complete graph, since this problem
generalizes the problem on complete graphs that is NP-hard.
AGE
3.2
Approximation algorithms
Motivated by the NP-hardness of the MAX k- COMPLETE COVERAGE problem we turn our attention
to approximation algorithms. In this section we describe three distinct approximation algorithms. The
1
first one is a trivial one and provides for even values of k at least k−1
times the value of an optimal
1
solution, and for odd values of k at least k times the value of an optimal solution. The other two
algorithms relates to two auxiliary problems. Namely, the MAXIMUM COVERAGE PROBLEM and the
DENSEST k- SUBGRAPH PROBLEM .
The MAXIMUM COVERAGE PROBLEM is defined on a given a collection of elements and a collection F of subsets of the element set. The objective is to select up to L members of F that cover
a maximum number of elements. The MAXIMUM COVERAGE WITH CARDINALITY CONSTRAINTS
PROBLEM (MCCC) is a variant of the maximum coverage problem where F is partitioned into L
sub-collections F 1 , F 2 , . . . ,F L , and the constraint restricting the choice of at most L subsets from F
is replaced by a set of constraints enforcing for each j the choice of a single member of F j .
The DENSEST k- SUBGRAPH PROBLEM is defined on an undirected graph G = (V, E) and an
integer number k. The goal is to find a subset V 0 ⊆ V of at most k nodes so as to maximize the number
of edges in the induced subgraph of G over V 0 . This problem is known to be NP-hard and the current
best known approximation algorithm for this problem has an approximation ratio of O(n−1/3+δ ) for
a positive fixed number δ [11]. Moreover, there is an O( nk )-approximation algorithm for this problem
(see for example [2]).
The second approximation algorithm for non-fixed values of k is based on the greedy algorithm
that was analyzed by Chekuri and Kumar [7]. We show that if there is a ρ-approximation algorithm
for the MAX k- COMPLETE COVERAGE problem then there is a ρ-approximation algorithm for the
densest k-subgraph problem, and if there is a ρ-approximation algorithm for the densest k-subgraph
problem then there is a 12 ρ-approximation algorithm for MAX k- COMPLETE COVERAGE problem.
This latter result shows that the two problems for non-fixed values of k are almost equivalent as
far as approximation algorithms are concerned. The third algorithm that we
³ show
´r is using Ageev
and Sviridenko’s [1] approximation algorithm for MCCC to derive an 1 − 1 − 1r -approximation
algorithm for r being the maximum number of k-subsets that contain a given edge (the maximum
is over all edges). So r can be as large as O(nk−2 ). This algorithm is based on solving a linear
programming relaxation of the problem with the number of variables as large as the number of all
possible k-subsets Vj . We note that for fixed value of k, using the last result in order to approximate the
densest k-subgraph problem is superfluous, because using a similar time complexity we can enumerate
all the k-subsets and pick the densest k-subgraph.
3.2.1 The trivial algorithm
The so-called trivial algorithm is a generalization of the matching procedure used to solve the MAX 2COMPLETE COVERAGE problem. Given an instance of MAX k- COMPLETE COVERAGE we construct
a bipartite graph B = (A1 ; A2 , EB ) as before. For each subset Sj there is a corresponding node in A1
denoted by vSj , and for each edge e of G there is a corresponding node in A2 denoted by ue . There is
an edge (vSj , ue ) ∈ EB if both endpoints of e belong to Sj .
8
A b-matching in the bipartite graph B is a set of edges M that has up to bi edges adjacent to node
i. For each node i in A1 , bi = b k2 c, and for each node j in A2 , bj = 1. An optimal b-matching has
a maximum number of edges among all b-matchings. From the optimal b-matching, M , we generate
a feasible solution to the MAX k- COMPLETE COVERAGE by setting, for each Sj , the subset Vj is the
one consisting of the endpoints of its matched edges {eq ∈ A2 |(sj , eq ) ∈ M } in the b-matching.
³
Theorem 3.1 The trivial approximation algorithm is a
³ ´
ues of k and a
O(nmL log n).
1
k
1
k−1
´
-approximation algorithm for even val-
-approximation algorithm for odd values of k. The complexity of the algorithm is
Proof: The solution delivered by the algorithm is feasible: Each Vj consists of the endpoints of at
most b k2 c edges and thus has at most k nodes and is contained in Sj . The algorithm is clearly a
polynomial-time algorithm since b-matching is a polynomial problem. Using the algorithm for bmatching in bipartite graphs of Gabow and Tarjan [15, 16] the running time is O(nmL log n) (for
more information on this problem and related results see Chapter 21 in Schrijver [26]). It remains to
prove the approximation ratio of the algorithm.
Consider an optimal solution for the MAX k- COMPLETE COVERAGE instance. For each Sj the
optimal solution has a subset Vj∗ of at most k nodes. We construct an equivalent disjoint collection
of sets Ej∗ by assigning for each edge e of G with both endpoints in a common Vj∗ , to the least index
set Vj∗ that contains both endpoints of e. For each Ej∗ , if there are at least b k2 c assigned edges, then
an optimal b-matching has a set of b k2 c assigned edges to Sj . For sets Ej∗ with fewer edges than b k2 c,
¡ ¢
the b-matching can assign the entire set of edges Ej∗ . Since |Ej∗ | ≤ k2 , the approximation ratio of
the trivial approximation algorithm is at least
values of k this is k1 .
b k2 c
(k2)
. For even values of k this is
1
k−1 ,
whereas for odd
3.2.2 Applying the greedy algorithm when k is not fixed
We denote by ρ the approximation factor of an approximation algorithm for the densest k-subgraph
problem (so ρ = O(n−1/3+δ ) < 1, or ρ = O( nk )).
The greedy algorithm iteratively picks subsets that cover, each in turn, the maximum number of
uncovered elements. In each step the subset picked is only a subset of Sj such that the algorithm
did not select earlier another subset of Sj . Chekuri and Kumar [7] proved that this algorithm is a
1
2 -approximation algorithm. If in each step of the greedy algorithm, instead of picking the subset that
covers the maximum number of uncovered elements (among the subsets of Sj such that the algorithm
did not select earlier another subset of Sj ), the algorithm picks a subset that covers at least β times
the maximum number of uncovered elements (note that β ≤ 1), then Chekuri and Kumar showed that
the resulting approximation ratio is 12 β.
Theorem 3.2 When k is not a fixed constant, if there is a ρ-approximation algorithm for the densest ksubgraph problem, then there is a 12 ρ-approximation algorithm for the MAX k- COMPLETE COVERAGE
problem. Hence there is an O(min{n−1/3+δ , nk }) < 1 approximation algorithm for the MAX kCOMPLETE COVERAGE problem.
Proof: The proof follows because if the algorithm picks a subset that covers at least β times the
maximum number of uncovered elements (note that β ≤ 1), then Chekuri and Kumar showed that the
resulting approximation ratio is 12 β. We next show that when k is not fixed, then we can design such
9
a greedy algorithm that in each step picks at least β = ρ times the maximum number of uncovered
elements. To see this, we apply the following procedure for each j such that the algorithm did not
select earlier another subset of Sj : We construct an auxiliary graph Gj = (Sj , Ej ) where Ej is the
set of edges from G that connects two nodes of Sj that the algorithm did not cover earlier. In this
auxiliary graph we find an approximated densest k-subgraph using the algorithm of [11], and denote
its node set by Vj . We pick the subset Vj that covers most elements among the different values of j.
Since in each step we use a ρ-approximation algorithm for computing Vj , we conclude that the subset
that we return, covers at least ρ times the maximum number of elements that can be covered using one
subset at this step.
Proposition 3.3 If there is a ρ0 -approximation algorithm for the MAX k- COMPLETE COVERAGE
problem, then there is also a ρ0 -approximation algorithm for the densest k-subgraph problem.
Proof: Note that one can set one reviewer with S = V , and the resulting MAX k- COMPLETE COVER AGE instance is exactly the instance of the densest k-subgraph problem on the same graph. Therefore,
we conclude the claim.
3.2.3 Transforming the MAX k- COMPLETE COVERAGE problem into MCCC when k is fixed
For each Sj we write down the list of all subsets of Sj that have exactly k elements. Denote by F j
S
the resulting family of k-subsets of Sj , and if |Sj | < k we let F j = {Sj }. Denote F= j F j . The
MAX k- COMPLETE COVERAGE problem is to choose one set from each F j such that the number of
covered edges with endpoints in a common set is maximized. The resulting problem is an instance of
the MAXIMUM COVERAGE WITH CARDINALITY CONSTRAINTS PROBLEM . The size of this instance
is polynomial if we assume that k is fixed.
Theorem 3.3 When k is a fixed constant, there is a polynomial time (1 − 1e )-approximation algorithm
for the MAX k- COMPLETE COVERAGE problem.
Proof: The algorithm of Ageev and Sviridenko [1] is based on solving a linear programming relaxation of the problem, and then rounding the resulting solution. The formulation solved is the linear
programming relaxation of the following integer program:
max
s.t.
P
P
j zj
i∈Jj
xi ≥ zj
∀j
i∈Iq
xi = 1
∀q = 1, 2, . . . , L
P
xi , zj ∈ {0, 1} ∀i, j
In this integer program xi is an indicator variable for each k-subset that is contained in some Sj .
Note that the same k-subset, if contained in more than one Si , will have several variables, one for
each subset it is contained in. This variable is set to 1 if the corresponding k-subset contained in some
particular subset Sj is selected for the solution. There is a binary variable zj for each edge, which
takes the value 1 if the edge is covered by the solution. The index set Jq is the set of all k-subsets
that contains the edge eq . The index set Iq is the set of all k-subsets corresponding to Sq . So the
constraints say that if an edge is covered then at least one k-subset that contains both its endpoints is
picked, and that we choose exactly one k-subset for each Sq .
10
³
´r
The approximation ratio of the algorithm is 1 − 1 − 1r where r is the largest number of 1’s in a
row in the above mathematical programming formulation. That is, r is the maximum size of a family
of k-subsets each containing a given edge (the maximum is over all edges) such that each such edge
is contained in some Sj . Clearly, this number can be approximately as large as nk−2 . Therefore, the
bound is approximately 1 − 1e , and we establish the claim.
4 The (k, p)-RCon, the (k, p)-NCon and the (k, p)-ECon problems
The standard definitions of edge-connectivity and node-connectivity are as follows: A connected
graph H = (U, F ) is p-edge connected if the removal of up to p−1 edges from F results in a connected
graph. A connected graph H = (U, F ) is p-node connected if the removal of up to p − 1 nodes from
U results in a connected graph. We also define a new connectivity measure of the review graph that
we call reviewer-connectivity defined as follows: A review graph GR is a p-reviewer connected if the
removal of all the edges corresponding to at most p − 1 reviewers from GR results in a connected
graph. We say that a pair of projects has p reviewer disjoint ranking paths between them, if the
removal of at most p − 1 reviewers from the review graph keep the two projects in the same connected
component of the resulting graph.
In order to increase the reliability of the implied rankings it is desirable that there will be more than
a single ranking path between each pair. To that end we require that there are at least p edge disjoint
ranking paths in GR between each pair of projects. That is, the removal of at most p − 1 pairwise
comparisons by a given reviewer results a connected review graph. In the example in Figure 1 there
are four ranking paths between 1 and 5 with only three of them edge-disjoint. In this graph nodes 3
and 4 have only two edge disjoint paths between them. So this allocation is a solution to the p = 2
edge disjoint requirement. The second review graph shown in Figure 3 is 3-edge connected. The
associated problem is to maximize the number of pairs of projects that have at least p edge disjoint
ranking paths between them. This associated problem is called the (k, p)- EDGE CONNECTIVITY
ALLOCATION problem denoted as (k, p) − ECon.
Similarly to the p-edge connectivity problem, in order to increase the reliability of the implied
rankings, we ask that that the review graph will be p reviewer connected. That is, the removal of at
most p − 1 reviewers results in a connected review graph. The motivation for studying this problem
is the fact that the implied rankings of some pairs depend on very few reviewers and this situation
can skew the results. The associated problem is to maximize the number of pairs of projects that
have at least p reviewer disjoint ranking paths between them. This associated problem is called the
(k, p)- REVIEWER CONNECTIVITY ALLOCATION problem denoted as (k, p) − RCon. We note that
when k = 2 the (2, p)-ECon and the (2, p)-RCon problems are equivalent problems. However, for
larger values of k, the notion of reviewer connectivity is different from that of edge connectivity.
Similarly to the p-edge connectivity problem and the (k, p)-reviewer connectivity allocation problem, in order to increase the reliability of the implied rankings, we ask that there are at least p node
disjoint ranking paths in GR between each pair of projects. That is, the removal of at most p − 1
projects results in a connected review graph. The motivation for this is that low node connectivity indicates that the implied rankings of some pairs depend on very few projects and can skew the results.
In the example in Figure 1 the review graph is only 1 node connected, as there are no two node disjoint
paths between node 1 and node 6. Therefore, the implied ranking of nodes 1 and 6 depends only on
the relative strength of project 5. When project 5 is particularly strong, then the implied ranking of
projects 1 and 6 may not be meaningful as the extent of the differentiation between them is dominated
by the strength of project 5. This example can be magnified in a star review graph such as the one
11
Figure 3: A review graph GR with star topology.
shown in Figure 3. The review graph in this example is a 1-node connected. The associated problem
is to maximize the number of pairs of projects that have at least p node disjoint ranking paths between
them. This associated problem is called the (k, p)- NODE CONNECTIVITY ALLOCATION problem
denoted as (k, p) − N Con.
Theorem 4.1 The (k, 2)-RCon problem is NP-complete for all fixed values of k such that k ≥ 2.
Proof: To show that the (k, 2)-RCon problem is in NP we demonstrate that when given a feasible
solution to the problem, the feasibility of it can be tested efficiently as follows. We test that each
reviewer is allocated at most k projects from its set, and to check the 2-reviewer connectivity, we construct the review graph GR , and we test if it is 2-reviewer connected. This can be done efficiently by
testing for each reviewer if the removal of the edge set that corresponds to his/her direct comparisons,
results in a connected graph.
To show that the problem is NP-complete, we present a reduction from Hamiltonian cycle in
bipartite graphs that is NP-complete (see problem [GT37] in [17]). Given a bipartite graph B =
(C, D, E) with sides C and D, of the Hamiltonian cycle problem, we construct an instance to the
(k, 2)-RCon problem as follows.
The reviewer set is {r, r0 : r ∈ D}. I.e., for each node r in D we will have a pair of reviewers
associated with this node. For each r ∈ D, we construct a set of k − 1 new projects Pr = {pjr |1 ≤
S
j ≤ k − 1}. Our project set will be C ∪ r∈D Pr . Reviewer r and r0 can review the projects in the set
Pr ∪ {i ∈ C|(i, r) ∈ E}. So in order to obtain a 2-reviewer connected solution for each project in Pr
there must be (at least) two reviewers that review the project. Therefore, each of the reviewers r and
r0 must review all the projects in Pr , and another project from the set C (that is adjacent to r in G).
We now claim that G is Hamiltonian if and only if the (k, 2)-RCon instance is feasible.
Assume that G is Hamiltonian, then if [i1 , r], [i2 , r] are edges in the Hamiltonian cycle, then we
assign reviewer r the project i1 and assign reviewer r0 the project i2 . We also assign both r and
r0 the project set Pr . Then, the resulting graph of direct comparisons among projects is 2-reviewer
connected (as there is a cycle over the nodes in C and the other nodes are connected to this cycle via
a pair of edges). Therefore, in this case the (k, 2)-RCon instance is feasible.
Assume that the (k, 2)-RCon instance is feasible. Then, as stated above reviewer r and also
reviewer r0 must review the project set Pr and another project from C. We consider the induced sub-
12
graph of GR over the node set resulting from V R by deleting from V R for all r ∈ D all nodes in Pr
S
except one representative. I.e., the subgraph induced by C ∪ r∈D {p1r }. In this induced subgraph
each node has degree two. Moreover, since GR is 2-reviewer connected, it is also 2-edge connected,
and therefore this induced subgraph is also 2-edge connected. By a counting argument this induced
subgraph is a Hamiltonian cycle. Now for each pair of nodes from u, v ∈ C such that along this
Hamiltonian cycle there is a path between u and v consisting of two edges, there is a node r ∈ D such
that along this Hamiltonian cycle p1r is adjacent to both u and v. Therefore, we can place r between
these two projects nodes to obtain an Hamiltonian cycle of the bipartite graph B. Therefore, B is
Hamiltonian.
Proposition 4.1 The (k, 2)-NCon problem is NP-complete for all fixed values of k such that k ≥ 2.
Proof: We argue that the same proof of Theorem 4.1 shows the NP-completeness of the (k, 2)-NCon
problem as well, where we look for an allocation so the resulting review graph is 2-node connected.
To see this first note that the (k, 2)-NCon problem is in NP because we can guess the solution, construct the review graph and check that the review graph is 2-node connected. Next, we use the same
reduction from the Hamiltonian cycle problem in bipartite graph as devised in the proof of Theorem
4.1. In this reduction, if GR is 2-node connected it is also 2-reviewer connected, and therefore the
bipartite graph is Hamiltonian. On the other hand if the bipartite graph B is Hamiltonian, then the
resulting solution as stated for the (k, 2)-RCon problem is also feasible to the (k, 2)-NCon problem.
Theorem 4.2 The (k, p)-NCon problem is NP-complete for all fixed values of p and k such that p ≥ 2
and k ≥ 2.
Proof: The (k, p)-NCon problem is in NP because given an instance for this problem we can guess
the resulting solution, then we can compute the review graph and verify that it is p-node connected
graph by computing the global min-cut in a node weighted graph (where the cut has to separate two
non-empty node sets). In order to show that the problem is NP-complete, we will show how to modify
the reduction from Hamiltonian cycle in bipartite graphs described in the proof of Theorem 4.1.
Let the bipartite graph be B = (C, D, E) as before. The reviewer set is {r1 , r2 , . . . , rp : r ∈
D} ∪ A. I.e., for each node r in D we will have a set of p reviewers associated with this node. We
will have an additional set of auxiliary reviewers denoted as A. For each r ∈ D, we construct a set of
k − 1 new projects Pr = {pjr |1 ≤ j ≤ k − 1}. We will have another p − 2 projects called auxiliary
S
projects denoted as PA = {a1 , . . . , ap−2 }. Our project set is then C ∪ PA ∪ r∈D Pr . Reviewer r1
and r2 can review the projects in the set Pr ∪ {i ∈ C|(i, r) ∈ E}, and reviewer ri+2 can review the
projects Pr ∪ {ai }. So in order to obtain a p-node connected solution for each project in Pr there
must be (at least) p reviewers that review the project. Therefore, each of the reviewers r1 and r2 must
review all the projects in Pr , and another project from the set C (that is adjacent to r in G). Moreover,
each reviewer ri+2 (for i ≥ 1) must review all projects in Pr ∪ {ai }. For each project q ∈ C ∪ PA
and for each i = 1, 2, . . . , p − 2 such that q 6= ai , we have a reviewer in A that can review only the
pair of projects {ai , q}.
Similarly, to the proof of Theorem 4.1, we now claim that B is Hamiltonian if and only if the
(k, p)-NCon instance is feasible.
Assume that B is Hamiltonian, then for [i1 , r], [i2 , r] edges in the Hamiltonian cycle we assign
reviewer r1 the project i1 and assign reviewer r2 the project i2 . We now argue that the resulting graph
of direct comparisons among projects is p-node connected. To see this note that between any pair of
13
projects q, q 0 , there are p node disjoint paths: p − 2 of these paths are through a single node from
PA and the other 2 node disjoint paths are using the Hamiltonian cycle. Therefore, in this case the
(k, p)-NCon instance is feasible.
Assume that the (k, p)-NCon instance is feasible. Then, as stated above reviewer r1 and also
reviewer r2 must review the project set Pr and another project from C. We consider the subgraph of
S
GR induced by C ∪ r∈D {p1r }. In this induced subgraph each node has degree two. Moreover, since
GR is p-node connected, this induced subgraph is 2-node connected. This is so because we remove
PA and hence decrease the size of any node cut by at most k − 2 and this results a 2-node connected
subgraph. The removal of the other nodes do not reduce the node connectivity any further similarly
to the proof of Theorem 4.1.
By a counting argument this induced subgraph is a Hamiltonian cycle. Now for each pair of nodes
from u, v ∈ C such that along this Hamiltonian cycle there is a path between u and v consisting of
two edges, there is a node r ∈ D that along this Hamiltonian cycle p1r is adjacent to both u and v.
Therefore, we can place r between these two projects nodes to obtain an Hamiltonian cycle of the
bipartite graph B. Therefore, B is Hamiltonian.
It remains to consider the complexity status of the (k, p)-ECon problem. We first note that if
each reviewer has an expertise set that contains at least k projects and p ≤ k − 1, then the review
graph is p-edge connected if and only if it is 1-edge connected (i.e., if it is a feasible solution to
the CONNECTIVITY-k-ALOC problem). This is so because w.l.o.g. each set Vj in the solution to the
CONNECTIVITY -k- ALOC problem has exactly k nodes (and not less than k nodes). Therefore, the
removal of at most p − 1 edges from the review graph keeps each node set of a given reviewer in the
same connected component and hence it remains a connected graph. We next consider the case where
p ≥ k, and we prove that for 2k − 1 ≥ p ≥ k the problem is NP-complete.
Proposition 4.2 The (k, p)-ECon problem is NP-complete for all values of k ≥ 2 such that 2k − 2 ≥
p ≥ k.
Proof: The (k, p)-ECon problem is in NP since we can guess the solution, construct the corresponding
review graph, and then check that the review graph is p-edge connected (by computing the global mincut in the review graph). In order to show that it is NP-complete, we modify the reduction in the proof
of Theorem 4.1. Given an input bipartite graph B for the Hamiltonian cycle problem, we construct an
input to the (k, p)-ECon problem that is the (k, 2)-RCon instance constructed in the proof of Theorem
4.1.
If the (k, 2)-RCon instance is a YES instance, then by removing at most p − 1 edges from the
review graph we can disconnect the node set that correspond to at most one reviewer (note that the
expertise set of each reviewer in the reduction has at least k projects), and therefore by removing at
most p−1 edges the review graph remains connected. Therefore, in this case the (k, p)-ECon instance
is a YES instance as well. By the reduction of Theorem 4.1, it means that if B is Hamiltonian, then
the (k, p)-ECon instance is a YES instance.
If the (k, p)-ECon instance is a YES instance, then note the fact that in this solution (that shows
that it is a YES instance), each project is reviewed by at least two reviewers (as otherwise we can
disconnect it from the rest of the project by removing k−1 edges). Therefore, by a counting argument,
each project is reviewed by exactly two reviewers. Then, we can apply the same proof as in the proof
of Theorem 4.1, to show that in this case B is Hamiltonian.
14
5 The 2- REVIEWER CONNECTIVITY AUGMENTATION problem
The 2- REVIEWER CONNECTIVITY AUGMENTATION problem is the associated optimization problem
defined as follows. The input is a feasible solution to the CONNECTIVITY-k-ALOC problem, and
integer numbers q and k ≥ 2. The goal is to augment the solution to the CONNECTIVITY-k-ALOC
problem using at most q additional reviewers, where each of these has an expertise set that equals to
the set of all projects and we can assign at most k projects for each additional reviewer. The goal is to
maximize the number of pairs of projects such that the resulting review graph (constructed by adding
the q additional reviewers to the solution of the CONNECTIVITY-k-ALOC problem) has two reviewerdisjoint paths between them. The motivation for studying this problem is the fact that in some cases
there are few reviewers that can review the whole set of projects (these reviewers might be the panel
members) though their expertise level in each subject is smaller than the one of the regular reviewers.
Therefore, we would like to have a feasible solution to the CONNECTIVITY-k-ALOC problem using
a high level expert in each of the projects, and to use the panel members in a method to increase the
robustness of the resulting ranking, by providing two reviewer-disjoint paths between some pairs of
projects.
The problem of finding a minimum set of edges that must be added to a given subgraph so that the
resulting graph is 2-edge connected has been studied previously. Eswaran and Tarjan [10] provided
a sufficient and necessary conditions, and a linear time algorithm to construct an optimal solution is
given in [21, 25]. Therefore, if k = 2 and the number of edges that can be augmented to GR is large
enough so that it is possible to make the whole review graph a 2-edge connected graph (or 2-reviewer
connected graph), then such a feasible solution can be found in linear time.
In this section we show how to solve in polynomial time the 2- REVIEWER CONNECTIVITY AUG MENTATION problem. We define a block to be a non-trivial (i.e., with at least two nodes) maximal
node set such that between each pair of nodes in this set there are at least two reviewer-disjoint paths.
The next lemma simplifies the structure of an optimal solution to the 2- REVIEWER CONNECTIVITY
AUGMENTATION problem.
Lemma 5.1 Given an optimal solution such that there is a pair of additional reviewers r, r0 where r
review projects p1 and p2 , and r0 reviews p3 and p4 (among perhaps other projects), then p1 , p2 , p3 ,
and p4 belong to a common block.
Proof: First note that p1 and p2 belong to a common block (and similarly p3 and p4 belong to a
common block). This is so as between p1 and p2 there is one path that is using the comparisons of the
solution to the CONNECTIVITY-k-ALLOC problem, and another path using the additional reviewer, r
that reviews both of them.
To conclude the claim assume that p1 and p3 do not belong a common block. We next consider the
ranking path between p1 and p3 , and the path between p2 and p4 in the solution to the CONNECTIVITYk-ALOC problem (see Figure 4). These two paths share a common reviewer r˜ as otherwise p1 and p3
are in the same block. This is so since p1 and p2 are in the same block, and therefore we can extend
the path from p2 to p4 by the two additional reviewers to obtain a path from p1 to p3 that is reviewerdisjoint to the path between these projects in the solution to the CONNECTIVITY-k-ALOC problem.
We next replace the allocated sets of projects of r and r0 , by assigning project p1 to r0 and not to
r and assigning the project p3 to r instead of r0 . Then, the project set assigned to r˜ belongs to the
block of p1 and p3 in the resulting new solution. Moreover, the same project set is also in the block of
p2 and p4 . Therefore, the set of projects assigned to either r or r0 belongs to a common block. Since
other blocks are not separated by the above change (in the solution), the resulting solution is no worse
15
Figure 4: The solution to the CONNECTIVITY-k-ALOC problem in the proof of Lemma 5.1
then the initial one. Moreover, since we merge at least two blocks into one block the number of pairs
in a common block increased, and therefore this contradicts the assumption that the initial solution is
an optimal one. Therefore, in an optimal solution the claim holds.
By Lemma 5.1, the entire set of projects that is reviewed by at least one additional reviewer is
in one block, that we denote as the main block. Assume that in the main block there are ` projects
and there are t pairs of projects from this block that have been 2-reviewer connected already in the
CONNECTIVITY -k- ALOC solution. Then, by using the q additional reviewers we 2-reviewer connect
¡ ¢
another 2` − t, and we would like to maximize this amount.
Next, we describe Algorithm Augment DP for solving the 2- REVIEWER CONNECTIVITY AUG MENTATION problem. Our algorithm is summarized in Figure 5.
Given a feasible solution to the CONNECTIVITY-k-ALOC problem, we construct a tree over the
bipartite graph B = (P, R, E) whose sides are the project set P and the reviewer set R and each edge
in E connects a reviewer to a project that lies within his/her expertise set. In this bipartite graph we
contract each 2-reviewer connected component (of the review graph) with any reviewer such that his
entire set of projects lies in a common 2-reviewer connected component of GR . The resulting graph is
still not necessarily a tree, but we consider a BFS tree constructed from an arbitrary node. Note that in
this tree we have to select a set of kq leaves that will be reviewed by the additional reviewers. These
leaves and the paths in the tree between pairs of these leaves will be the ¡main
block in the solution.
`¢
So the goal is to find a set of kq leaves that will maximizes the amount 2 − t defined above. Our
algorithm is based on a dynamic programming procedure, and we prove the following theorem:
Theorem 5.1 Algorithm Augment DP solves the 2- REVIEWER
problem in O(n5 k 2 q 2 ) time.
CONNECTIVITY AUGMENTATION
Proof: In the resulting tree we assign a non-negative integer weight w(v) to each node v. If v ∈ R
then w(v) = 0. Otherwise, if v ∈ P and no node was contracted into v then w(v) = 1, and if v is
a result of contracting nodes into it then w(v) is the total number of projects that were contracted to
this node. We root the resulting tree at an arbitrary node denoted as root.
Next, we apply a preprocessing step on the tree so that each node in the tree has at most two
children. To achieve this goal, we replace a node v with ∆ children by a binary tree with ∆ leaves,
and the root of this binary tree has weight w(v), each internal node of the binary tree has zero weight,
16
and the ∆ leaves correspond to the ∆ children of v before the change occurs. Hence, we can assume
that the input graph is a binary tree and each node has an integral weight smaller than n.
Given the binary tree T rooted at root where each node v has a weight w(v) ∈ {0, 1, . . . , n}, we
apply the following dynamic programming algorithm. We first assume that the root will be a member
of the main block. In this case at least one leaf of the selected kq leaves will be a descendant of the
right child of root and at least one selected leaf will be a descendant of the left child of root. We
denote by Fi,j,t the maximum total weight of nodes that can be covered using at most j paths each
of them starting at i and going down the tree until reaching a leaf, and such that these j paths cover
exactly t pair of projects that are already 2-reviewer connected in the CONNECTIVITY-k-ALOC initial
solution. We would like to compute maxt=0,1,...,(n) Froot,kq,t − t. The computation is carried bottom2
up. During the computation we set the value of F to be −∞ for infeasible values of the triple i, j, t.
For a leaf node v,

¡w(v)¢

 w(v) if j ≥ 1, t = 2
Fv,j,t =
.
0
if j = 0, t = 0

 −∞ otherwise
Next, consider the case where the current node v has only one child denoted as u. Then,
(
Fv,j,t =
w(v) + Fu,j,t−(w(v)) if j ≥ 1
2
0
otherwise
where we assume that Fu,j,t = −∞ for all negative values of t. Finally, we assume that v has two
children denoted as u and u0 . If j ≥ 1 then
Fv,j,t = w(v) +
max
j 0 =0,1,...,j,
w(v)
t0 =0,1,...,t−
2
(
Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v)) ,
2
)
and otherwise Fv,0,t = 0 where we set Fv,j,t to equal −∞ if either j or t are negative. Since the
weight of a node in the tree is at most nGR the total time complexity to solve the entire problem is
O(n5 k 2 q 2 ) as there are O(n3 kq) values to compute and each of them takes O(n2 kq). This dynamic
programming solves the 2-reviewer connectivity augmentation problem assuming that root belongs to
the main block. We can check all possibilities of selecting the root node and obtaining a polynomial
time algorithm that solves the problem (by paying an extra factor of n in the time complexity).
However, we next show how to modify the dynamic programming so that we will not assume that
root belongs to the main block. This assumption is used by the fact that we search for a collection of
kq paths going down from root to leaves of the tree and that we assume that these paths also cover the
root. The next modification removes this assumption. In the case where the current node v has only
one child denoted as u. Then,
Fv,j,t =


 w(v) + Fu,j,t−(w(v)
2 )
Fu,j,t


0
if kq > j ≥ 1
if j = kq
otherwise
where we assume that Fu,j,t = −∞ for all negative values of t. In the case where v has two children
denoted as u and u0 . If j ≥ 1 then

w(v) + max j 0 =0,1,...,j,
Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v))



2
0 =0,1,...,t−(w(v))

t

2






Fv,j,t =
0
0
0
0
max
F
,
F
,
w(v)
+
max
F
+
F

j =1,2,...,j−1,
u,j ,t

u0 ,j−j 0 ,t−t0 −(w(v)
 u,j,t u ,j,t

2 )
w(v)

t0 =0,1,...,t−(

2 )


0
if kq > j ≥ 1
if j = kq
if j = 0
17
and otherwise Fv,0,t = 0 where we set Fv,j,t to equal −∞ if either j or t are negative. We denote the
resulting algorithm by Augment DP, and we conclude the claim.
6 The (k, q)- HOP problem
As the length of the ranking path increases, the robustness and reliability of the implied ranking
decreases. It is therefore desirable to limit the length of the ranking paths. For each pair of projects
we require the existence of at least one ranking path that has length of at most q hops. That means
that the ranking path has at most q edges. When all projects are directly comparable the allocation
provides a 1-hop review graph. The graph in Figure 1 is a 3-hop review graph and the graph in Figure
3 is a 2-hop review graph. The associated problem called (k, q)- HOP problem is to maximize the
number of pairs of projects that have at least one q-hop ranking path between them.
The (k, q)- HOP problem was addressed recently by Park and Newman [24] in ranking college
football teams. They include in a graph a directed arc from i to j if (football) team i wins against
team j. They conclude an implied win if there is a directed path with q hops, where the weight of this
implied win decreases exponentially with q. The algorithm they developed is based on the diminished
importance of the implied paths as a function of the number of hops. This model is different from
ours in that the graph is determined as an outcome of the evaluation process - the process of playing
the games, whereas in the intensity of preferences models the graph topology is determined by the
k-allocation and only the intensities are determined by the evaluations.
In this section we show that the (k, q)- HOP problem is NP-complete for all fixed values of q and
k such that q ≥ 2 and k ≥ 2. We note that the case of q = 1 is exactly the MAX k- COMPLETE
COVERAGE problem that is polynomially solvable for k = 2 and NP-complete for all values of k such
that k ≥ 3. In order to show that the (k, q)- HOP problem is NP-complete for all values of q we first
show this claim for q = 2, and then show how to extend it to other values of q.
Theorem 6.1 The (k, 2)- HOP problem is NP-complete for all fixed values of k such that k ≥ 2.
Proof: The problem is clearly in NP as given a solution it is easy to test that each reviewer reviews at
most k projects, and that for each pair of projects there exists a path of at most two hops that compare
this pair (by testing all such paths).
Now consider the following reduction from SAT (see problem [LO1] in [17]): Assume that we
are given an instance for the SAT problem consisting of clauses C1 , C2 , . . . , Cm over the variables
v1 , v2 , . . . , vn . W.l.o.g. we assume that for each i we have a clause consisting of the pair of literals vi
and vi (where vi is the negation of vi ). We construct an instance for the (k, 2)- HOP problem as follows:
Our project set is defined in the following way. For each variable vi , we have k projects the first of
these is associated with the variable, the second with its negation (these first two projects are called
literal projects) and the remaining k −2 projects are denoted as Q1i , . . . , Qk−2
. For each clause Cj , we
i
j
have a set of k − 1 projects Pi for i = 1, 2, . . . , k − 1. To conclude our list of projects we have another
truth project T and an auxiliary project A. Our reviewer set is defined as follows: For each pair of
literals li , li0 we have a reviewer with set that equals {li , li0 } (and hence w.l.o.g. this set is selected for
this reviewer). For each project p such that p 6= T, A we will have a reviewer with project set equals
{p, A} (and hence w.l.o.g. this set is selected for this reviewer). For each clause Cj we will have a
clause reviewer with a set of projects equals Cj ∪ {Pij : i = 1, 2, . . . , k − 1}, and for each variable vi
we will have a truth-assignment reviewer with a set of projects equals T, Q1i , . . . , Qk−2
, vi , vi .
i
We first note that each pair of literal nodes are compared using the corresponding reviewer (the
one with project set equals {li , li0 }). We also note that in order for the solution to be connected each
18
clause reviewer with project set Cj ∪ {Pij : i = 1, 2, . . . , k − 1} reviews exactly one project from Cj
(and the other k − 1 projects are Pij for all i), and each truth-assignment reviewer that corresponds to
vi must review the projects Q1i , . . . , Qk−2
and two other projects from T, vi , vi (as otherwise if he/she
i
does not review Qji then there is no 2-hop path from T to Qji ).
To conclude the proof we will show the following claim.
Claim 6.1 The following three conditions are equivalent.
1. The (k, 2)- HOP instance is feasible.
2. There is a 2-hop path from T to all the projects.
3. There is a truth assignment that satisfies the SAT formula.
Proof: Assume that the (k, 2)- HOP instance is feasible. Then clearly there is a 2-hop path from T
to all the projects. For each variable vi such that the i-th truth-reviewer reviews T , we note that this
reviewer reviews exactly one of the projects vi and vi , and we let the literal of the project that he/she
reviews to have value TRUE whereas its negation is FALSE. We let the truth-assignment of the other
(yet undefined) variables to be arbitrary. We next claim that each clause Cj has a TRUE literal. To
see this claim note that there is a 2-hop path from P1j to T . This 2-hop path must traverse a middle
project that is one of the literals that belongs to Cj . Since there is a reviewer that reviews both this
literal and T , we conclude that this reviewer is a truth-assignment reviewer, and therefore we assign
this literal a TRUE value. Therefore, the SAT formula is satisfied.
Assume that the SAT formula is satisfied by a truth assignment φ. We let the j-th clause reviewer
review the set Pij for i = 1, 2, . . . , k − 1 and also review one of the literals that belongs to Cj and
assigned a TRUE value in φ. We let the truth-assignment reviewer reviews the set T, Q1i , . . . , Qk−2
i
and one extra project that is the TRUE literal among vi and vi . Then, clearly each project is adjacent
to one of the literal projects and therefore has a 2-hop path to all the literals projects. Moreover each
project Pij is compared to A, and therefore there is a 2-hop path between each pair of such projects.
We next note that there is a 2-hop path from A to T . This is so because assuming that clause Cj is
satisfied by literal l (and during the solution we picked l as the TRUE literal of Cj ), then both T and
A are compared to l (T is compared by the truth-assignment reviewer and A by the reviewer with
project set {l, A}). It remains to show that there is a 2-hop path from T to Pij . This 2-hop path is
established because there is an l ∈ Cj that is a TRUE literal such that the clause reviewer compare
l and Pij (l is the unique literal project that this clause reviewer reviews), and the truth-assignment
reviewer compare l and T . This provides a 2-hop path from T to Pij .
This concludes the proof of Theorem 6.1.
Theorem 6.2 The (k, q)- HOP problem is NP-complete for all fixed values of q and k such that q ≥ 2
and k ≥ 2.
Proof: To extend the proof of Theorem 6.1 to larger values of q, we change the reduction so that we
will have another q − 2 new projects denoted as a1 , a2 , . . . , aq−2 and q − 2 new reviewers where the
i-th new reviewer is able to review only ai and ai+1 (where aq−1 = T is the truth project). We also
remove the project A from the project set and remove all the reviewers that contain it in their expertise
set. The resulting instance of the (k, q)- HOP is feasible if and only if there is a 2-hop path from T to
all the projects beside a1 , a2 , . . . , aq−2 . This is so because for each pair of projects that are associated
with either a variable or a clause, there is a 3-hop path connecting these projects. But as shown in
Claim 6.1, there is a 2-hop path from T to all the projects beside a1 , a2 , . . . , aq−2 if and only if there
is a truth assignment that satisfies all clauses. Therefore, we conclude the claim.
19
7 Concluding remarks
In this paper we study optimization and decision problems relating to the allocation of reviewing tasks
during the process of evaluating a large number of projects. Whereas several problems are shown to be
polynomially solvable, most studied problems are shown to be NP-complete. We believe that finding
polynomially solvable special cases that are of relevance to some applications is an interesting and
important open question that is left for future research.
References
[1] A. A. Ageev and M. I. Sviridenko, ”Pipage rounding: a new method of constructing algorithms with proven performance guarantee,” Journal of Combinatorial Optimization, 8,
307–328, 2004.
[2] Y. Asahiro, K. Iwama, H. Tamaki and T. Tokuyama, ”Greedily finding a dense subgraph,”
J. Algorithms, 34, 203–221, 2000.
[3] J. J. Bartholdi, C. A. Tovey and M. A. Trick, ”The computational difficulty of manipulating
an election,” Social Choice and Welfare, 6, 227–241, 1989.
[4] J. P. Brans and Ph. Vincke, ”A preference ranking organization method,” Management Science, 31, 647–656, 1985.
[5] C. Brezovec, G. Cornuejols and F. Glover, “Two algorithms for weighted matroid intersection,” Mathematical Programming, 36, 39-53, 1986.
[6] C. Brezovec, G. Cornuejols and F. Glover, ”A matroid algorithm and its application to the
efficient solution of two optimization problems on graphs,” Mathematical Programming,
42, 471-487, 1988.
[7] C. Chekuri and A. Kumar, ”Maximum coverage problem with group budget constraints and
applications,” in Proceedings of APPROX 2004, 72–83, 2004.
[8] W.D. Cook, B. Golany, M. Kress, M. Penn and T. Raviv. Optimal allocation of proposals to
reviewers to facilitate effective ranking. Management Science, 51, 655–661, 2005.
[9] D. Dor and M. Tarsi, ”Graph decomposition is NP-complete: a complete proof of Holyer’s
conjecture,” SIAM Journal on Computing, 26, 1166–1187, 1997.
[10] K. P. Eswaran and R. E. Tarjan, ”Augmentation problems,” SIAM Journal on Computing,
5, 653-665, 1976.
[11] U. Feige, G. Kortsarz and D. Peleg, ”The Dense k-Subgraph Problem,” Algorithmica, 29,
410-421, 2001.
[12] E. Fernandez and R. Olemdo, ”An agent model based on ideas of concordance and discordance for group ranking problems,” Decision Support Systems, 39, 429–443, 2005.
[13] G. N. Frederickson and M. A. Srinivas, ”Algorithms and data structures for an expanded
family of matroid intersection problems,” SIAM Journal on Computing, 18, 112-138, 1989.
20
[14] R. Fuller and Ch. Carlsson, ”Fuzzy multiple criteria decision making: recent developments,”
Fuzzy sets and systems, 78, 139–153, 1996.
[15] H. N. Gabow and R. E. Tarjan, ”Almost optimum speed-ups of algorithms for bipartite
matching and related problems,” in Proceedings of STOC 1988, 514–527, 1988.
[16] H. N. Gabow and R. E. Tarjan, ”Faster scaling algorithms for network problems,” SIAM
Journal of Computing, 18, 1013–1036, 1989.
[17] M. R. Garey and D. S. Johnson, Computers and Intractability, W.H. Freeman and Co., New
York, 1979.
[18] D. S. Hochbaum and B. Chandran, ”Further below the flow decomposition barrier of maximum flow for bipartite matching and maximum closure.” Manuscript, UC Berkeley April
2004.
[19] D. S. Hochbaum and A. Levin, ”Methodologies for the group rankings decision.” Management Science, 52, 1394-1408, 2006.
[20] I. Holyer, ”The NP-completeness of some edge-partition problems,” SIAM Journal on Computing, 10, 713–717, 1981.
[21] T. Hsu and V. Ramachandran, ”On finding smallest augmentation to biconnect a graph,”
SIAM Journal on Computing, 22, 889-912, 1993.
[22] J. P. Keener, ”The Perron-Frobenius theorem and the rating of football teams,” SIAM review,
35, 80–93, 1993.
[23] J. G. Kemeny and J. L. Snell, ”Preference ranking: An axiomatic approach,” In Mathematical models in the social sciences, Boston, Ginn, 9–23, 1962.
[24] J. Park, and M.E.J. Newman, ”A network-based ranking system for US college football,”
Journal of Statistical Mechanics: Theory and Experiment, (Oct. 31, 2005). Abstract available at http://www.iop.org/EJ/abstract/1742-5468/2005/10/P10014.
[25] A. Rosenthal and A. Goldner, ”Smallest augmentations to biconnect a graph,” SIAM Journal
on Computing, 6, 55-66, 1977.
[26] A. Schrijver, ”Combinatorial optimization polyhedra and efficiency”, Springer–Verlag,
Berlin, 2003.
21
Input. A feasible solution to the CONNECTIVITY-k-ALOC problem.
Preprocessing phase.
1. Construct the bipartite graph B = (P, R, E), and contract each 2-reviewer connected component (of GR ) with any reviewer such that his entire set of projects lies in a common 2-reviewer
connected component of GR .
2. Consider a BFS tree constructed from an arbitrary node, and root the tree at an arbitrary node
root.
3. Assign weights to nodes: If v ∈ R then w(v) = 0. Otherwise, if v ∈ P and no node was
contracted into v then w(v) = 1, and if v is a result of contracting nodes into it then w(v) is the
total number of projects that were contracted to this node.
4. Replace the tree with a binary tree T , by replacing a node v with ∆ > 2 children by a binary
tree with ∆ leaves where all internal nodes have zero weight.
The dynamic programming procedure. Fi,j,t is the maximum total weight of nodes that can be
covered using at most j paths each of them starting at i and going down the tree until reaching a leaf,
and such that these j paths cover exactly t pair of projects that are already 2-reviewer connected in
the CONNECTIVITY-k-ALOC initial solution.
Goal of the procedure. To compute max{Froot,kq,t − t} such that 0 ≤ t ≤
¡ n¢
2
.
Apply the following computation in a bottom-up fashion. (Where we set Fv,j,t to equal −∞ if either
j or t are negative.)
1. If v is a leaf of the tree, then we set Fv,j,t = w(v) if kq ≥ j ≥ 1 and t = 0, and otherwise
Fv,j,t = 0.
2. If v has only one child denoted as u, then we set Fv,j,t to be w(v) + Fu,j,t−(w(v)) if kq > j ≥ 1,
2
Fu,j,t if j = kq and 0 otherwise.
3. If v has two children denoted as u and u0 .
(a) If kq > j ≥ 1, then
Fv,j,t = w(v) +
max
j 0 =0,1,...,j,
w(v)
t0 =0,1,...,t−
2
(
Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v)) .
2
)
(b) If j = kq, then








Fv,j,t = max Fu,j,t , Fu0 ,j,t , w(v) +



max
j 0 =1,2,...,j−1,
w(v)
t0 =0,1,...,t−
2
(
Fu,j 0 ,t0 + Fu0 ,j−j 0 ,t−t0 −(w(v))
2
)
(c) If j = 0, then Fv,j,t = 0.
Figure 5: Algorithm Augment DP



.