Download Report

Travelling through Theoretical Computer Science
Zihan Tan
Abstract
Through 4 years at Institute Interdisciplinary Information Science, I
was well-educated in branches of Theoretical Computer Science (TCS).
While enjoying various of excellent courses here, I also worked on many
research problems, which excited my interest in design and analysis of
algorithms, computational complexity, combinatorial optimization, theory
of machine learning, theory of network, etc. By doing research, attending
seminars and taking courses I developed a good flavor and a broad vision
of TCS. This manuscript collects my main research notes in various fields,
vividly depicting the 4-year-long journey in the paradigm of TCS.
• Object has projections. What about the converse?
• Is a given network suitable for computing the given function?
• How hard is bridgecard game?
• How long will two random walks meet?
• Do better on Influence Maximization!
• Hypothesis Testing is Impossible without the knowledge that the optimal
is unique!
• Do better on Pagerank!
• How should you make online decisions?
• A Linear Propagation model for Compressive Sensing.
• Evolution Theory is related with Game Theory.
• Mutual Exclusion is impossible with only one stack.
• Information-theoretic Differential Privacy is pessimistic. What about computational ones?
• Random Serial Dictatorship is bad about Facility Location.
• Conway’s Life Game.
1
On the Inequalities of Projected Volumes and the Constructible
Region
Zihan Tan ; Liwei Zeng ; Jian Li
Abstract
We study the following geometry problem: given a 2n −1 dimensional vector π = {πS }S⊆[n],S6=∅ ,
is there an object T ⊆ Rn such that log(vol(TS )) = πS , for all S ⊆ [n], where TS is the projection
of T to the subspace spanned by the axes in S? If π does correspond to an object in Rn , we
say that π is constructible. We use Ψn to denote the constructible region, i.e., the set of all conn
as and Thomason showed that Ψn is contained in a
structible vectors in R2 −1 . In 1995, Bollob´
polyhedral cone, defined a class of so called uniform cover inequalities. We propose a new set of
natural inequalities, called nonuniform-cover inequalities, which generalize the BT inequalities.
We show that any linear inequality that all points in Ψn satisfy must be a nonuniform-cover
inequality. Based on this result and an example by Bollob´
as and Thomason, we show that
constructible region Ψn is not even convex, and thus cannot be fully characterized by linear
inequalities. We further show that some subclasses of the nonuniform-cover inequalities are not
correct by various combinatorial constructions, which refutes a previous conjecture about Ψn .
Finally, we conclude with an interesting conjecture regarding the convex hull of Ψn .
1
Introduction
We use notations introduced in [3].
Let T be an object in Rn+ and let {v1 , · · · , vn } be the standard basis of Rn . By an object,
we mean a bounded compact subset of Rn+ . We let Span(S) denote the subspace spanned by
{vi | i ∈ S}. Given an index set S ⊂ [n] = {1, 2, · · · , n} with |S| = d, we denote by TS the
orthogonal projection of T onto Span(S) , and by |TS | its d-dimensional volume. We use |T | to
denote the n-dimensional volume of T . Given an n-dimensional object T , define π(T ) to be the
log-projection vector of T , which is a 2n − 1 dimensional vector with entries indexed by subsets of
[n] and π(T )S = log |TS | for all S ⊆ [n] (we use the convention that log 0 = −∞). Whenever we
refer to a 2n − 1 dimensional vector π, we assume that the entries are indexed by the subsets of [n]
(i.e., πS is the entry index by S ⊂ [n]). We say that a 2n − 1 dimensional vector π is constructible
if π is the log-projection vector of some object T in Rn . Let us define the constructible region Ψn ,
the central subject studied in this paper, to be the set of all constructible vectors:
n −1
Ψn = {π ∈ R2
| π is constructible}.
Having the above definitions, it is natural to ask the following questions:
1. Given a 2n − 1 dimensional vector π, is there an algorithm to decide whether π is in Ψn ?
2. What does Ψn look like? What property does Ψn have?
1
In 1995, Bollob´
as and Thomason [3] proposed a class of inequalities relating the projected
volumes. Their result reads as follows. Let A be a family of subsets of [n]. We say A is a k-cover
of [n], if each element of [n] appears exactly k times in the multiset induced by A. For example,
{{1, 2}, {2, 3}, {1, 3}} is a 2-uniform cover of {1, 2, 3}.
n
Theorem 1. (Bollob´
as-Thomason (BT) uniform-cover
Q inequalities) Suppose T is an object in R
k
and A is a k-cover of [n]. Then, we have that |T | ≤ A∈A |TA |.
With the above notations, we define the polyhedron cone
X
n
BT n = {π ∈ R2 −1 | kπS ≤
πA , for all k and A that k-covers S, S ⊂ [n]}.
A∈A
BT inequalities essentially assert that every constructible vector is in BT n , or equivalently Ψn ⊆
BT n . In the very same paper [3], they also presented a non-constructible point in BT 4 , which
immediately implies that Ψn
BT n . However the above result does not rule out the possibility
that Ψn is convex, or even can be characterized by a finite set of linear inequalities.
1.1
Our Results
Except the results mentioned above, very little is known about Ψn and the main goal of this paper
is to deepen our understanding about its structure. First, we propose a new class of natural inequalities, called nonuniform-cover inequalities, which generalize the BT uniform-cover inequalities.
We need a few notations first.
1
Let A = {Ai }ki=1 , B = {Bj }m
j=1 be two families of subsets of [n], where Ai s and Bj s are subsets
of [n]. We say A covers B if the following properties hold:
1. The disjoint union of {Ai }ki=1 is the same as the disjoint union of {Bj }m
j=1 . In other words,
for every element e ∈ [n], |{i | e ∈ Ai }| = |{j | e ∈ Bj }|.
2. Let Σ = {(Ai , t) | t ∈ Ai } and Γ = {(Bj , s) | s ∈ Bj }, and there is an one-to-one mapping f
between Σ and Γ such that: for any (Ai , t) ∈ Σ with (Bj , s) = f (Ai , t), t = s and Ai ⊂ Bj .
Definition 1. (Nonuniform-Cover (NC) inequalities) x is a 2n − 1 dimensional vector. Suppose
A covers B. A nonuniform-cover inequality is of the following form:
Y
Y
xAi ≥
xBj .
Ai ∈A
Bj ∈B
Example 1. Let A = {{1, 2}, {2, 3}, {3, 4}} and B = {{1, 2, 3}, {2, 3, 4}}. We can see A covers
B. The corresponding NC inequality is x{1,2} · x{2,3} · x{3,4} ≥ x{1,2,3} · x{2,3,4} . Here is another
example: x{1} · x{1,2} · x{2,3} · x{3,4} · x{2,4} ≥ x{1,2,3} · x{2,3,4} · x{1,2,4} .
P
P
When the context is clear, we refer to a linear inequality of the form B πBj ≤ A πAi as an NC
inequality as well. It is easy to see that that every BT inequality is an NC inequality. But the
converse may not be true. For example, x{1,2} · x{2,3} · x{3,4} ≥ x{1,2,3} · x{2,3,4} . (We alert the reader
that we do not claim such inequalities are always true. We will discuss it in detail in Section 4.)
1
A subset of [n] may appear more than one times in A or B
2
Similar to BT n , we define N C n to be the set of all points that satisfies all NC inequalities:
Formally, it is the following polyhedron cone:
X
X
n
N C n = {π ∈ R2 −1 |
πBi ≤
πAi , for all A, B such that A covers B}.
Bj ∈B
Ai ∈A
Our first result states that all correct linear inequalities should be in this class.
P
Theorem 2. If all points in Ψn satisfy a certain linear inequality S⊆[n] αS πS ≤ 0, the linear
inequality must be an NC inequality, or a positive combination of NC inequalities.
In order to prove the above theorem, we introduce a class of objects called rectangular flowers.
We let RF n to denote all possible log-projection vectors that can be generated by rectangular
flowers (see the definition in Section 2). We show that for any linear inequality that is not an NC
inequality, we can construct a rectangular flower which violates the inequality. It is simple to show
that a log-projection vector of a rectangular flower in Rn satisfies all nonuniform cover inequalities
(i.e., it is in N C n ). Moreover, we show that for every point π ∈ N C n , there is a rectangular flower
in Rn whose log-projection vector is π. Therefore, we can prove the following theorem.
Theorem 3. For all n ≥ 1, N C n = RF n ⊆ Ψn .
Given Theorem 3, it is natural to ask whether N C n = Ψn . If the answer was yes, Ψn would have
a compact description and deciding whether a point is in Ψn can be done using linear programming
(see Section 2 for the details). However, the answer is not that simple. In fact, using Theorem 2,
we can show our next result which states that Ψn is not even convex for n ≥ 4. We note that for
n = 1, 2, 3, Ψn = BT n , thus convex. For completeness, we provide a proof in Appendix A.
Theorem 4. (Non-convexity of Ψn ) For n ≥ 4, Ψn is not convex.
n
Theorem 4 implies that there exist certain constructible vector in R2 −1 which violates some
NC constraint. In other words, N C n
Ψn . Thus it would be interesting to know which NC
inequalities are true and which are false (we already know BT inequalities are true). In Section 4, we
provide several methods for constructing counterexamples for different subclasses of NC inequalities.
However, we have not been able to disprove all NC inequalities that are not BT inequalities, nor
prove any of them. This leads us to conjecture the following.
P
P
Conjecture 1. If all points in Ψn satisfy a certain linear inequality j βj πBj ≤ i αi πAi , the
linear inequality must be a BT inequality or a positive combination of several BT inequalities.
Moreover, BT n = Conv(Ψn ), the convex hull of Ψn .
At the end of the introduction, we summarize our results in the following chain:
RF n = N C n ( Ψn ( Conv(Ψn ) ⊆ BT n ,
and we conjecture that Conv(Ψn ) = BT n .
3
1.2
A Motivating Problem from Databases
Our problem is closely related to the data generation problem [1] studied in the area of databases,
which is in fact our initial motivation for studying the problem. Generating synthetic relation under
various constraints is a key problem for testing data management systems. A relation R(A1 , . . . , An )
is essentially a table, where each row is one record about some entity, and each column Ai is an
attribute. One of the most important operations in relational databases is the projection operation
to a subset of attributes. One can think of the projection to subset S of attributes, denoted as
πS (R), as the table R first restricted to columns in S, and then with duplication removed. To see
the connection between the database problem and geometry, we can think a relation R(A1 , . . . , An )
with n attributes as an n-dimensional object T in Rn : A tuple (i.e., a row) (t1 , t2 , . . . , tm ) can be
thought as a unit cube [t1 − 1, t1 ] × . . . × [tm − 1, tm ]. Then, TS , the projection of T to Span(S),
corresponds to exactly the projected relation πS (R).
Example 2. The following table shows the information of course registration. 5 items in the table
correspond to unitsquare in the coordinate system. In this way, a table is represented by an object
in Euclidean space.
Courses
Rank
1
2
3
4
5
Biology
Physics
Name
Alice
Alice
Alice
Bob
Bob
Course
Math
Physics
Biology
Math
Physics
Math
Alice
Bob
Students
In the data generating problem with projection constraints, we are given the cardinalities |πS (R)|
for a set of subsets S ⊆ [n]. The goal is to construct a relation R that is consistent with the given
cardinalities. We can see it is a discrete version of our geometry problem. Moreover, if the given
cardinalities (after taking logarithm) is not in Ψn , or violate any projection inequality, there is
obviously no solution to the data generation problem. Therefore, a good understanding of our
geometry problem is central for solving the data generation problem.
1.3
Other Related Work
Loomis and Whitney proved a class of projection inequalities in [7], allowing one to upper bound
the volume of a d−dimensional object by the volumes of its (d − 1)-dimensional projection volumes.
Their inequalities are special cases of BT inequalities. BT inequalities and their generalizations
also play an essential role in the worst-case optimal join problem in databases (we can get an upper
bound of the size of the relation R knowing the cardinalities of its projections). See e.g., [8] for
some most recent results on this problem.
4
There is a large body of literature on the constructible region Γn for joint entropy function over
n random variables X1 , . . . , Xn . More specifically, for each joint distribution over X1 , . . . , Xn , there
is a point in Πn , which is a 2n − 1 dimensional vector, with the entry indexed by S ⊆ [n] being
H({Xi }i∈S ). Characterizing Γn is one major problem in information theory and has been studied
extensively. Many entropy inequalities are known, including Shannon-type inequalities and several
non-Shannon-type inequalities. For a comprehensive treatment of this topic, we refer interested
readers to the book [9]. There are close connections between entropy inequalities and projection
inequalities [2–5]. In particular, BT inequalities can be easily derived from the well known Shearer’s
entropy inequalities [4] (many even regard them as the same).
2
Proof of Theorem 2 and Theorem 3
In this section, we prove Theorem 2 and Theorem 3. We need to introduce a class of special
geometric objects, which are crucial to our proofs. We say an n-dimensional object F ⊆ Rn+ is
n
cornered if x ∈ T implies y ∈ F for all y ≤ x (i.e., yi ≤ xi for all i ∈ [n]). An object R ⊆ R+
is said to be an open rectangle if R = (0, a1 ] × (0, a2 ] × . . . × (0, an ], or a closed rectangle if
R = [0, a1 ] × [0, a2 ] × . . . × [0, an ].
n is a rectangular flower if
Definition 2. We say F ⊆ R+
1. F is cornered,
2. F ∩ (0, ∞)S is a open rectangle in (0, ∞)S for any S ⊂ [n].
ΛS
ΣS
s
t
ΛT
ΣT
(ii)
(i)
Figure 1: (i) A 3-dimensional rectangular flower. (ii) The network flow N (A, B). The dashed line
represents the minimum s-t cut.
See Figure 1 for an example.
It is easy to see that a rectangular flower F ⊂ Rn+ is a union of
S
− 1 closed rectangles S⊆[n],S6=∅ FS , each FS being a closed rectangle in Span(S). Moreover, If
S ⊂ S 0 , for any i ∈ S, the edge length of RS along axis i is no shorter than that of RS 0 (since F is
cornered).
We also need to introduce a new class of inequalities, call fractional nonuniform-cover inequalities, which can be seen as the fractional generalization of NC inequalities. We need some notations
first. Let A = {(Ai , αi )}ki=1 , B = {(Bj , βj )}m
j=1 be two families of weighted subsets of [n], where Ai s
and Bj s are subsets of [n] and αi > 0 (βj resp.) is the nonnegative weight associated with Ai (Bj
resp.). Construct a network flow instance N (A, B) as follows: Let Σ = {(Ai , x) | x ∈ Ai , Ai ∈ A}
2n
5
and Λ = {(Bj , y) | y ∈ Bj , Bj ∈ B} be sets of nodes. Let node s be the source and node t be the
sink. There is an arc from s to each node (Ai , x) ∈ Σ with capacity αi . There is an arc from each
node (Bj , y) ∈ Λ to t with capacity βj . For each pair of (Ai , x) and (Bj , y), there is an arc with
capacity +∞ from (Ai , x) to (Bj , y) if Ai ⊆ Bi and x = y. We say A saturates B if the following
properties hold:
P
P
C1. For any x ∈ [n], ki=1 αi 1(x ∈ Ai ) = m
j=1 βj 1(x ∈ Bj ).
P
C2. The maximum s-t flow (or equivalently, the minimum s-t cut) of N (A, B) is j βj .
Definition 3. (Fractional-Nonuniform-Cover (FNC) inequalities) Suppose T is an object in Rn
and A covers B. A fractional-nonuniform-cover inequality is of the following form:
Y
Y
|TBj |βj .
|TAi |αi ≥
(Bj ,βj )∈B
(Ai ,αi )∈A
P When the context is clear, we also refer to linear inequalities of the form
Bj ∈B βj πBj as FNC inequalities.
P
Ai ∈A αi πAi
≥
Lemma 1. The set of FNC inequalities (the linear form) is exactly the set of all nonnegative linear
combinations of NC inequalities.
Proof. It is trivial to see that a nonnegative linear combination of NC inequalities is an FNC
inequality. Now, we show the other direction. Fix the dimension to be n. Consider an arbitrary
FNC inequality cx ≤ 0. If all entries in c are rational number, the FNC inequality itself is an NC
inequality by scaling all coefficients by some integer factor (this is because if all capacities of the
network are integral, there is an integral maximum flow). So, we only need to handle the case
where some entries of c are not rational. Now, we show that every point in N C n is satisfied by
cx ≤ 0. Suppose the contrary that there is point y ∈ N C n but cy = > 0. However, we claim that
there is a sequence of FNC inequalities with rational coefficients {c(i) x ≤ 0}i such that limi c(i) = c
Hence, we have that, for some sufficiently large i, c(i) y ≥ /2 > 0, which renders a contradiction.
Now, we briefly argue why the claimed sequence exists. It is not hard to see that the set of
coefficient vectors c corresponding to FNC inequalities is a rational polyhedral cone, defined by the
linear constraints C1 and the flow constraint C2, which again can be captured by linear constraints
(using auxiliary flow variables). So there is a set V of rational generating vectors and c can be
written as a nonnegative combination of these vectors. Suppose c = V γ, γ ≥ 0 (each column of
V is a generating vector). Pick an arbitrary rational nonnegative sequence of vectors {γ (j) }j that
approach to γ, and {V γ (j) } would be the desired sequence.
The rest can be seen from Farka’s Lemma: Let Ax ≤ 0 be a feasible system of inequalities and
cx ≤ 0 be an inequality satisfied by all x with Ax ≤ 0. Then, By Farka’s Lemma, cx ≤ 0 is a
nonnegative linear combination of the inequalities in Ax ≤ 0 (see e.g., [6]).
Proof of Theorem 2. We only need to show that all non-FNC inequalities are wrong. Suppose F is
an object. Consider an arbitrary non-FNC inequality:
Y
Y
|FAi |αi ≥
|FBj |βj ,
(1)
A
B
where A does not saturate B. We show that we can construct a rectangular flower F that this
inequality does not hold.
6
C1 does
not hold: for some x ∈ [n],
Pk Consider the network
Pm flow instance N (A, B). Suppose
Pk
Pm
α
1(x
∈
A
)
=
6
β
1(x
∈
B
).
First,
if
α
|A
|
6
=
i
i
j
j
i
i
j=1 βj |Bj |, we can easily see
i=1
j=1
i=1
Pk
P
n
that (1) is false by considering F = [0, 2] (log LHS = i=1 αi |Ai | and log RHS = m
j=1 βj |Bi |) ).
Pm
Pk
Pm
Pk
Now, suppose i=1 αi |Ai | = j=1 βj |Bj | but i=1 αi 1(x ∈ Ai ) 6= j=1 βj 1(x ∈ Bj ) for some x.
W.l.o.g., assume
P x = 1. Let F = [0, 2] × [0, 1] ×
P. . . × [0, 1]. Again, it is easy to see (1) is false since
log LHS = ki=1 αi 1(x ∈ Ai ) and log RHS = m
j=1 βj 1(x ∈ Bj )).
P Now, suppose C2 is false, that is the value of the minimum s-t cut of N (A, B) is less than
j βj . Suppose the minimum s-t cut defines the partition (S, T ) of vertices such that s ∈ S and
t ∈ T . Let Σ and Λ be defined P
as above, and ΣS = Σ ∩ S, ΣT = Σ ∩ T , ΛS = Λ ∩ S, ΛT = Λ ∩ T .
Since the min-cut is less than j βj , none of the above four sets are empty. Clearly, there is no
edge from ΣS to ΛT since otherwise the value of the cut is infinity. In other words, ΛS absorbs all
outgoing edges from ΣS . (See Figure 1(ii)).
P
P
Moreover, we can see the value of the min-cut is (Ai ,x)∈ΣS αi + (Bj ,y)∈ΛT βj . Since this
P
P
P
value is less than (Bj ,y)∈Λ βj , we have that (Ai ,x)∈ΣS αi < (Bj ,y)∈ΛS βj due to C1. Now, we
S
construct the rectangular flower F . Suppose F = S⊆[n],S6=∅ FS and we use FS,x to denote the edge
length of the close rectangle FS along axis x ∈ S. We only need to specify all FS,x s as follows:
t
S ⊆ Bj , for some (Bj , y) ∈ ΛS
FS,x =
1
otherwise.
Now, we verify that the above rectangular flow F violates the given non-FNC inequality. In
fact, we can easily see that for any node (Ai , x) ∈ ΣS , there is a node (Bj , x) ∈ ΛS and we have
that FAi ,x = t. Hence,
Y
X
log
|FA |αi = log t
αi .
(Ai ,x)∈A
(Ai ,x)∈ΣS
On the other hand, we have that
log
Y
B∈B
|FB |βj ≥ log t
X
βj ,
(Bj ,y)∈ΛS
which implies that the given inequality is false. This proves Theorem 2.
We denote the set of log-projection vectors generated by rectangular flowers to be
RF n = {π ∈ R2
n −1
| π is the log-projection vector of some rectangular flower F }.
Now, we prove Theorem 3.
Proof of Theorem 3. Clearly, RF n ⊆ Ψn . We only need to show that RF n = N C n .
We can see that a given vector π is the log-projection vector of some rectangular flower in Rn
if the following linear program, denoted as LP(π), is feasible (treating fS,i as variables):
X
fS,i = πS , for all S ⊆ [n],
i∈S
fS,i ≥ fS 0 ,i,
for all S ⊂ S 0 ⊆ [n].
n
Hence, RF n = {π ∈ R2 −1 | LP(π) is feasible}. It is easy to check that RF n is a convex cone
(i.e., if π1 , π2 ∈ RF n , aπ1 + bπ2 ∈ RF n for any a, b > 0). In fact, from basic linear programming
7
fact, RF n is a polyhedron cone. In fact, this can be easily seen as follows: We can write LP(π)
as the standard matrix form {Ax = (π, 0), x ≥ 0}. Obviously, {Ax = (y1 , y2 ) | x ≥ 0} is a finitely
generated cone (generated by columns of A), thus a polyhedral cone. RF n is the intersection of
the above cone with the subspace {(y1 , y2 ) | x2 = 0}, which is again a polyhedral cone.
It is straightforward to verify that each point in RF n satisfies all NC inequalities (we leave
the verification to the reader). So RF n ⊆ N C n . Suppose
P for contradiction that there is a point
π ∈ NPC n but π 6∈ RF n . SoPthere is a hyperplane
S⊆[n] αS xS = 0 separating RF n and π
(with S⊆[n] αS πS > 0). So S⊆[n] αS xS ≤ 0 is not an FNC inequality (since π ∈ N C n should
satisfy all FNC inequalities). From the proof Theorem 2, we have shown that for any non-FNC
inequality,
we can construct a rectangular flower that violates the inequality. This contradicts
P
that S⊆[n] αS xS ≤ 0 for all x ∈ RF n . Hence, N C n ⊆ RF n . This concludes the proof of the
theorem.
At the
P end of this section, we briefly mention projection inequalities with nonzero constant
terms ( S αS xS ≤ β, forP
β 6= 0). If β < 0, none such inequality
P is true by just considering the
hypercube. Obviously,
if S αS xS ≤ 0 is true for all
P
Px ∈ Ψn , S αS xS ≤ β for all β > 0 also.
be true for any β > 0,
Moreover, if S αS xS ≤ 0 is not an FNC inequality, S αS xS ≤ β can notP
since we can make t large enough in the proof of Theorem
2.
Conversely,
if
S αS xS ≤ β for some
P
β > 0 is true for all x ∈ Ψn , it must hold that S αS xS ≤ 0 for all x ∈ Ψn . This is because if
x ∈ Ψn , ax ∈ Ψn for any a > 0. Therefore, it suffices to consider only those inequalities with zero
constant term.
3
Proof of Theorem 4: Non-Convexity of Ψn
In this section, we will prove Theorem 4: the non-convexity of constructible region Ψn for n ≥ 4.
We suppose the converse that Ψn is convex. First, we can see that if Ψn is convex, it must be a
convex cone (this is because if x ∈ Ψn , αx ∈ Ψn for α > 0). Hence, each supporting hyperplane of
Ψn must correspond to an FNC inequality.
Consider
Π0 = {(π(T ){1,2} , π(T ){1,3} , π(T ){2,3} , π(T ){2,4} , π(T ){3,4} , π(T ){1,2,3} , π(T ){2,3,4} ) | T is an object in Rn+ },
which is the projection of Ψn onto the subspace spanned by
{v{1,2} , v{1,3} , v{2,3} , v{2,4} , v{3,4} , v{1,2,3} , v{2,3,4} }
where vS is the axis indexed by S ⊂ [n]. Since Ψn is a convex cone, Π0 must also be a convex cone.
Hence, each linear inequality that defines Π0 must be some FNC inequality with terms
T{1,2} , T{1,3} , T{2,3} , T{2,4} , T{3,4} , T{1,2,3} , T{2,3,4} .
Now, we prove that any FNC inequality involving only the above terms is a nonnegative linear
combination of the following two BT inequalities:
|T{1,2} | · |T{1,3} | · |T{2,3} | ≥ |T{1,2,3} |2 ,
2
|T{2,3} | · |T{2,4} | · |T{3,4} | ≥ |T{2,3,4} |
(2)
(3)
By Lemma 1, it suffices to consider only NC inequalities. In fact, according to the definition
of NC, the right hand side can only contain the terms T{1,2,3} and T{2,3,4} . Apply Corollary 2 in
8
Single Cover Theorem (which is discussed in next section) on this inequality, we can see that it
must be a combination of (2) and (3). In other words, Π0 is defined by (2) and (3).
(t)
Now, we consider the vector φ0 , t > 0, t 6= 1,
(t)
φ0 = (0, 2 ln t, 0, 2 ln t, 0, ln t, ln t)
(t)
The example is essentially adopted from the example in [3]. It is easy to see that φ0 satisfies (2)
(t)
and (3). Now, we briefly show φ0 ∈
/ Π0 . Suppose there exists an object T with the log-projection
(t)
vector consistent with φ0 . In other words, |T{1,2} | = |T{2,3} | = |T{3,4} | = 1, |T{1,3} | = |T{2,4} | = t2 ,
|T{1,2,3} | = |T{2,3,4} | = t. Note that |T{1,2} | · |T{1,3} | · |T{2,3} | = |T{1,2,3} |2 .
From Theorem 4 in [3], we know that the projection of T{2,3} must be a rectangle B( 1t , t).
However, since |T{2,3} | · |T{2,4} | · |T{3,4} | = |T{2,3,4} |2 , the projection of T{2,3} must be a rectangle
B(t, 1t ). Since t 6= 1, the two boxes are not the same and we arrive at a contradiction. This shows
that Ψn is not convex and thus completes the proof of Theorem 4.
4
Counterexample Construction for NC\BT Inequalities
We have shown that the constructible region cannot be fully characterized by a set of linear inequalities as it is not convex. However, it is still interesting to see what are all correct linear inequalities.
Equivalently, we want to figure out the set of linear inequalities that define Conv(Ψn ), the convex
hull of Ψn .
In this section, we construct counterexamples for several NC but non-BT (denoted as NC\BT)
inequalities. Note that a compact object can be approximated by the union of small cubes, our
counterexamples are also unions of cubes.
4.1
Skeleton
In this subsection, we use an n-tuple (t1 , t2 , · · · , tn ) where ti s are non-negative
Qnintegers to represent
the n-dimensional unit hypercube: {(x1 , · · · xn ) | ∀i; ti ≤ xi ≤ ti + 1}, i.e., i=1 [ti , ti + 1]. Denote
the sum of two sets by their Minkovski sum, namely A + B = {a + b | a ∈ A, b ∈ B}. We need to
the notion of skeleton, which is important for our construction.
Q
Q
βj
Definition 4. (Connection Graph) In Rd , consider an FNC inequality ki=1 |TAi |αi ≥ m
j=1 |TBj |
(αi , βj > 0). The connection graph GC for the above inequality is an undirected graph GC = (V, E),
where V = {v1 , · · · , vn }, representing n dimensions. The edge (vi , vj ) ∈ E if and only if both i and
j appear in some Bj but not in any Ai .
Definition 5. Let C1 , C2 , · · · , Cs be all cliques (complete subgraphs) in GC . M is a large positive
integer. For every Cr , we define
SKCr (M ) = {t | 0 ≤ ti ≤ M − 1, ∀i ∈ Cr ; ti = 0, ∀i ∈
/ Cr }.
The skeleton for the given NC inequality is defined as
SKGC (M ) = ∪sr=1 SKCr (M )
See Figure 2 for an example.
9
3
M
1
2
M
3
2
1
M
Figure 2: (i) Skeleton. (ii) Connection Graph.
In the figure above, the connection graph is the right one and the corresponding skeleton is the
object on the left.
For S ⊂ [n], let ∆(S) be the size of maximum clique in GC [S], the subgraph induced by vertices
in S. For sufficiently large M , we have the following asymptotic estimations:
k
Y
i=1
m
Y
j=1
|TAi |αi ≈ M
|TBj |βj ≈ M
Pk
Pm
i=1
j=1
αi
,
βj ∆(Bj )
.
The following lemma is a direct consequence of the above estimate.
Q
Q
βj satisfies that
Lemma 2. If the NC inequality ki=1 |TAi |αi ≥ m
j=1 |TBj |
k
X
i=1
αi <
m
X
βj ∆(Bj )
j=1
then it is incorrect, i.e., there exists a counterexample for it.
Example 3. Consider the NC inequality |T12 | · |T23P
| · |T34 | ≥ |T123 | P
· |T234 |. The connection graph
GC contains two edges (1, 3) and (2, 4). We have i αi = 3 and j βj ∆(Bj ) = 4. Hence, the
inequality is not true in general.
4.2
Union of Boxes
By a box we mean a hypercube B(b) = {x | 0 ≤ xi ≤ bi } or a translation of it, i.e., B + v for
some positive vector v, here the sum is the Minkowski sum. The examples in this subsection are
the disjoint union of two boxes B1 and B2 . Here we require not only B1 and B2 are disjoint in Rn+ ,
but their projections onto any subspace RS are disjoint as well for any S ⊆ [n]. In particular, we
use the following two boxes:
B1 = B(1); B2 = B(M t1 , M t2 , · · · , M tn ) + 1
10
As before, for M sufficiently large and ti s to be determined later, the following asymptotic
equations hold:
k
Pk
P
Y
α max{0, s∈A ts }
i
,
|TAi |αi ≈ M i=1 i
i=1
m
Y
j=1
|TBj |βj ≈ M
Pm
j=1
βj max{0,
P
s∈Bj ts }
.
Note that we can use absolute value to replace the maximum function, max{0, a} = 12 (a + |a|),
we obtain the following lemma.
Lemma 3. If there exists t such that the following inequality is true:
k
X
i=1
αi |
X
s∈Ai
ts | <
m
X
j=1
βj |
X
s∈Bj
ts |,
then the corresponding NC inequality is incorrect.
Proof. Our counterexample is the union of two boxes B1 = B(1), B2 = B(M t1 , M t2 , · · · , M tn ) + 1
where
t is the counterexample
for the absolute value inequality. By the above asymptotics and
Pm P
Pk P
α
t
=
j=1
s∈Bj βj ts , we conclude that
i=1
s∈Ai i s
k
X
αi max{0,
i=1
X
s∈Ai
ts } <
m
X
j=1
βj max{0,
X
s∈Bj
ts }.
Hence, the object is a counterexample.
Example 4. Again, consider the NC inequality in Example 3. Let t = (1, −1, 1, −1). We can see
the condition of Lemma 3 is met and the inequality is incorrect.
Example 5. Consider the NC inequality |T13 | · |T23 | · |T124 | ≥ |T123 | · |T1234 | and t = (−1, −1, 1, 2).
So, this inequality is incorrect.
4.3
Exact Single Cover Theorem
Using the union of boxes method we can also obtain the following theorem which is a necessary
condition for an inequality to be true. Let ai be the 0/1 indicator vector for set Ai and bj for Bj ,
i.e., aij = 1 if and only if j ∈ Ai .
P
P
Theorem 5. (Exact Single Cover Theorem) If the FNC inequality ki=1 αi xAi ≥ m
j=1 βj xBj holds
for every x ∈ Ψn , then for all j ∈ [m], there exist nonnegative c1 , c2 , ..., ck such that ci ≤ αi for all
i and
k
X
ci ai = βj bj .
i=1
11
P
Proof. Let K = {x | x = ki=1 ci ai , 0 ≤ ci ≤ αi , i = 1, 2, · · · , k}. It is immediate that K is a
convex subset of Rn . If K does not include βj bj , by separating hyperplane theorem, there exists a
vector t = (t1 , t2 , ..., tn ) and real number a such that
t · x < a, ∀x ∈ K,
but βj t · bj > a.
We still use a union of two boxes to be the counterexample:
B1 = B(1); B2 = B(M t1 , M t2 , · · · , M tn ) + 1.
It can be seen that
m
Y
j=1
Now, it suffices to show that
k
Y
j=1
Qk
αi
j=1 |TAi |
Pk
|TAi |αi ≤ M
The last inequality holds since
|TBj |βj ≥ M βj t·bj > M a .
i=1
P
< M a . In the asymptotic showed before, we have that
αi max{0,
i:ai ·t≥0 αi ai
P
s∈Ai ts }
≤M
P
i:ai ·t≥0
αi ai ·t
< M a.
is in K. This completes the proof.
Now, we show two simple corollaries.
P
P
Corollary 1. Suppose the following FNC inequality ki=1 αi xAi ≥ m
j=1 βj xBj holds for all x ∈ Ψn ,
and the set indicator vectors ai are linearly independent. Then this inequality can be written as a
nonnegative combination of m BT inequalities.
Proof. Let A (B resp.) be the matrix with ai being the ith column (bj the jth column). Let
α = {α1 , . . . , αk }T and β = {β1P
, . . . , βm }T . By the definition of FNC, we know that
P Aα = Bβ.
k
c
a
for
some
0
≤
c
≤
α
.
So
Aα
=
A(
For each j, we know βj bj =
ji
i
i=1 ji i
j cj ), where
P
cj = (cj1 , . . . , cjk ). Since A has full column rank, it must be the case that α = j cj .
P
P
Corollary 2. Suppose the following FNC inequality ki=1 αi xAi ≥ m
j=1 βj xBj holds for all x ∈ Ψn ,
and m = 1 or 2. Then this inequality can be written as a nonnegative combination of m BT
inequalities.
P
Proof. We only need to consider the case m = 2. From Theorem 5, β1 b1 = ki=1 ci ai for some
Pk
Pk
0 ≤ ci ≤ αi . Since i=1 αi ai = β1 b1 + β1 b2 , we have that β2 b2 = i=1 (αi − ci )ai .
Example 6. Consider the NC inequality |T12 | · |T23 | · |T34 | ≥ |T123 | · |T234 | in Example 3. From
either of the above corollary, if it is true, it can be decomposed into two BT inequalities. However,
it is clear such a decomposition does not exist. So it is not true in general. Similarly, we can also
see the inequality in Example 5 is not true.
12
4.4
A Hybrid Approach
In fact, neither of the above methods are sufficient to disprove all NC\BT inequalities. In this
section, we demonstrate an application of the combination of the above approaches.
Example 7. One interesting example is the following NC inequality:
|T1 | · |T12 | · |T23 | · |T34 | · |T24 | ≥ |T123 | · |T234 | · |T124 |.
The example satisfies the statement of Theorem 5, however, we can show it is also not correct.
Our counterexample utilizes a combination of skeleton and union-box methods. We observe that
the given inequality is a combination of
|T12 | · |T23 | · |T34 | ≥ |T123 | · |T234 |,
and |T1 | · |T24 | ≥ |T124 |.
We already have a skeleton counterexample for the former. Our idea is to take the union of the
skeleton and a disjoint box B so that the values of |T12 |, |T23 |, |T34 |, |T123 |, |T234 | remain (approximately) the same, but |T1 | · |T24 | ≈ |T124 |. Since the skeleton construction allows the left hand side
to be arbitrarily larger than the right hand side, we can see that Example 7 is also incorrect.
We can let B = B(R3 , R−4 , R−6 , R5 ) with R > 0 large enough (larger than the constant M
in the skeleton construction). Hence, |T1 | · |T24 | ≈ |T124 | ≈ R4 but |T12 | ≈ M + R−1 , , |T23 | ≈
M + R−10 , |T34 | ≈ M + R−1 , |T123 | ≈ M 2 + R−7 , |T234 | ≈ M 2 + R−5 .
We have shown that some NC\BT, inequalities are not correct. It remains to ask whether there
is an NC\BT inequality is correct. We have been unable to discover one such inequality. We have
checked (in an exhaustive manner) all inequalities in R3 and R4 , and found out that all NC\BT
inequalities are not true. Hence, we propose Conjecture 1, mentioned in the introduction.
5
Final Remarks and Acknowledgements
All of our counterexamples in Section 4 are essentially combinatorial, and the constructions allow
one side of the inequality to be arbitrarily larger than the other side. We suspect that all incorrect
projection inequalities can be refuted in a similar fashion. In other words, we may not need to
construct very delicate, twisted geometric objects, but instead just a union of a small number of
boxes (the number related to n), to refute any incorrect linear projection inequality.
We have developed a few other techniques to disprove some of NC inequalities. For example
fitting boxes model is the combination of the two models we introduced. It consists of many boxes,
each constructed according to the connection graph. Fitting box model can be used to handle all
4-dimensional inequality. However, it is hard to analyze and generalize to higher dimensions, and
we decide not to introduce it here.
In 2010, the third author JL proposed the notion of rectangular flowers and suspected that
RF n = Ψn , which, if true, is a natural extension of the box theorem 2 in [3]. In fact, JL “verified”
the above claim empirically using hundreds of thousands datasets (synthetically generated from
different distributions with different dimensions and parameters). Now, we know that RF n ( Ψn .
But it is still an interesting fact that all NC inequalities are true for many “random-like” data and
there may be good mathematical reasons for it. Moreover, our counter-examples, which appear to
2
Let K be a body in Rm . The box theorem states that there is a rectangle B with vol(B) = vol(K) and
vol(πS (B)) ≤ vol(πS (K)) for every S ⊆ [m].
13
be quite simple in retrospect, may not be totally obvious without realizing the equivalence between
rectangular flowers and the NC inequalities.
We would like to thank Yuval Peres for introducing BT and Shearer’s inequalities to us, Elad
Verbin and Raymond Yeung for discussing non-Shannon-type inequalities. In particular, we would
like to thank Jeff Kahn for several discussions, and casting a doubt in the very beginning about
RF n ? = Ψn , even the convexity of Ψn , for n ≥ 4, despite the “empirical evidences” we showed
to him. We also thank Dan Suciu, Uri Zwick, Gil Kalai, Ely Porat, Zizhuo Wang, Chunwei Song,
Yuan Yao, Andrew Thomason and Jacob Fox for useful discussions.
References
[1] Arvind Arasu, Raghav Kaushik, and Jian Li. Data generation using declarative constraints.
In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data,
pages 685–696. ACM, 2011.
[2] Paul Balister and B´ela Bollob´
as. Projections, entropy and sumsets. Combinatorica, 32(2):125–
141, 2012.
[3] B´ela Bollob´
as and Andrew Thomason. Projections of bodies and hereditary properties of hypergraphs. Bulletin of the London Mathematical Society, 27(5):417–424, 1995.
[4] Fan RK Chung, Ronald L Graham, Peter Frankl, and James B Shearer. Some intersection
theorems for ordered sets and graphs. Journal of Combinatorial Theory, Series A, 43(1):23–37,
1986.
[5] Ehud Friedgut. Hypergraphs, entropy, and inequalities. The American Mathematical Monthly,
111(9):749–760, 2004.
[6] B. Korte and J. Vygen.
Springer, 2012.
Combinatorial optimization: theory and algorithms, volume 21.
[7] LH Loomis and H Whitney. An inequality related to the isoperimetric inequality. 1949.
[8] Hung Q Ngo, Ely Porat, Christopher R´e, and Atri Rudra. Worst-case optimal join algorithms:[extended abstract]. In Proceedings of the 31st symposium on Principles of Database Systems,
pages 37–48. ACM, 2012.
[9] Raymond W Yeung. Information theory and network coding. Springer, 2008.
A
Appendix 1 (BT n = Ψn for n ≤ 3)
In this section, we prove BT 3 = Ψ3 in R3 . This appears to be a folklore result, and we provide
a proof for completeness. Since BT inequalities are correct for every projection vector, it suffices
to prove that for any vector π, there exist an object T such that π(T ) = π if π satisfies all BT
inequalities in 3-dimension. In fact, we show BT n = N C n for n = 3. Since Ψn is sandwiched
between them, all three of them are the same. Hence, it suffices to show that any NC inequality in
R3 is the combination of some BT inequalities.
14
Suppose an NC inequality in R3 has the following form:
Y
|TS |αS ≥ 1
S⊆[3]
where αS ∈ Z. According to the definition of NC inequalities, it is not hard to verify that αS ≥ 0
for all |S| = 1. If αS ≥ 0 for all |S| = 2, the inequality is indeed a BT inequalities.
Now, suppose αS < 0 for some |S| = 2. Without lose of generality, we assume that α{1,2} < 0.
By definition of NC, we obtain the following inequalities:
α{1} ≥ −α{1,2} ; α{2} ≥ −α{1,2}
and
Y
S⊆[3]
|TS |αS ·
|T
· |T{2} | α{1,2}
≥ 1,
|T{1,2} |
{1} |
which is still an NC inequality without the term |T{1,2} |α{1,2} .
0
Q
0
0
Rewrite it as S⊆[3] |TS |αS ≥ 1 with α{1,2} = 0. If the above embedding inequality is still not
0
an BT inequality yet, then there exist some |S | = 2 such that αS 0 < 0. Without lose of generality,
0
we may assume that α{1,3} < 0. Repeating the above operation for α{1,2} , we can eliminate the
0
α
0
0
α
0
term |T{1,3} | {1,3} as well as |T{2,3} | {2,3} (if necessary) in the same way. The remaining part must
be a BT inequality since the only negative αS is α{1,2,3} . Thus, we prove that BT 3 = N C 3 = Ψ3 .
15
Upper Bound on Function Computation in
Directed Acyclic Networks
Cupjin Huang, Zihan Tan and Shenghao Yang
Institute for Interdisciplinary Information Sciences
Tsinghua University, Beijing, China
Abstract—Function computation in directed acyclic networks
is considered, where a sink node wants to compute a target
function with the inputs generated at multiple source nodes.
The network links are error-free but capacity-limited, and the
intermediate network nodes perform network coding. The target
function is required to be computed with zero error. The
computing rate of a network code is measured by the average
number of times that the target function can be computed for one
use of the network. We propose a cut-set bound on the computing
rate using an equivalence relation associated with the inputs of the
target function. Our bound holds for general target functions and
network topologies. We also show that our bound is tight for some
special cases where the computing capacity can be characterized.
I.
I NTRODUCTION
We consider function computation in a directed acyclic
network, where a target function f is intended to be calculated
at a sink node, and the input symbols of the target function
are generated at multiple source nodes. As a special case,
network communication is just the computation of the identity
function.1 Network function computation naturally arises in
sensor networks [1] and Internet of Things, and may find
applications in big data processing.
Various models and special cases of this problem have
been studied in literature (see the summarizations in [2]–
[4]). We are interested in the following network coding model
for function computation. Specifically, we assume that the
network links have limited (unit) capacity and are error-free.
Each source node generates multiple input symbols, and the
network codes perform vector network coding by using the
network multiple times.2 An intermediate network node can
transmit the output of a certain fixed function of the symbols it
receives. Here all the intermediated nodes are considered with
unbounded computing ability. The target function is required to
be computed correctly for all possible inputs. We are interested
in the computing rate of a network code that computes the
target function, i.e., the average number of times that the target
function can be computed for one use of the network. The
maximum achievable computing rate is called the computing
capacity.
When computing the identity function, the problem becomes the extensively studied network coding [5], [6], and it
is known that in general linear network codes are sufficient
to achieve the multicast capacity [6], [7]. For linear target
functions over a finite field, a complete characterization of
1A
function f : A → A is identity if f (x) = x for all x ∈ A.
use of a network means the use of each link in the network at most
2 One
once.
the computing capacity is not available for networks with one
sink node. Certain necessary and sufficient conditions have
been obtained such that linear network codes are sufficient to
calculate a linear target function [4], [8]. But in general, linear
network codes are not sufficient to achieve the computing
capacity of linear target functions [9].
Networks with a single sink node are discussed in this
paper, while both the target function and the network code
can be non-linear. In this scenario, the computing capacity is
known when the network is a multi-edge tree [2] or when the
target function is the identity function. For the general case,
various bounds on the computing capacity based on cut sets
have been studied [2], [3].
We find, however, that the upper bounds claimed in the
previous works do not generally hold. For an example that
we will evaluate, the computing capacity is strictly larger than
the two upper bounds claimed in [2], [3] respectively, where
the issue is related to the classification of the inputs of the
target function that are in a certain sense equivalent for each
cut. Towards a general upper bound, we define an equivalence
relation associated with the inputs of the target function (but
does not depend on the network topology) and propose a cutset bound on the computing capacity using this equivalence
relation. Our bound holds for general target functions and
general network topologies in the network coding model. We
also show that our bound is tight when the network is a multiedge tree or when the target function is the identity function.
In the remainder of this paper, Section II formally introduces the network computing model. The upper bound of
the computing rate is given in Theorem 3, and is proved in
Section IV. Section III compares with the previous results and
discusses the tightness of our upper bound. Omitted proofs in
this paper can be found in [10].
II.
M AIN R ESULTS
In this section, we will first introduce the network computing model. Then we will define cut sets and discuss some
special cases of the function computation problem. Last we
head to the main theorem about the cut-set bound for function
computation.
A. Function-Computing Network Codes
Let G = (V, E) be a directed acyclic graph (DAG) with
a finite vertex set V and an edge set E, where multi-edges
between a certain pair of nodes are allowed. A network over G
is denoted as N = (G, S, ρ), where S ⊂ V is called the source
nodes and ρ ∈ V\S is called the sink node ρ. Let s = |S|,
and without loss of generality (WLOG), let S = {1, 2, . . . , s}.
For an edge e = (u, v), we call u the tail of e (denoted by
tail(e)) and v the head of e (denoted by head(e)). Moreover,
for each node u ∈ V, let Ei (u) = {e ∈ E : head(e) = u}
and Eo (u) = {e ∈ E : tail(e) = u) be the set of incoming
edges and the set of outgoing edges of u, respectively. Fix
a topological order of the vertex set V. This order naturally
induces an order of the edge set E, where edges e > e0 if
either tail(e) > tail(e0 ) or tail(e) = tail(e0 ) and head(e) >
head(e0 ). WLOG, we assume that Ei (j) = ∅ for all source
nodes j ∈ S, and Eo (ρ) = ∅. We will illustrate in Section III-C
how to apply our results on a network with Ei (j) 6= ∅ for
certain j ∈ S.
if e is an outgoing edge of u ∈ V \ (S ∪ {ρ}), then
g e (x) = he g Ei (u) (x) .
Denote by A and O two finite alphabets. Let f : As → O
be the target function, which is the function to be computed
via the network and whose ith input is generated at the ith
source node. We may use the network to compute the function
multiple times. Suppose that the jth source node consecutively
generates k symbols in A denoted by x1j , x2j , . . . , xkj , and
the symbols generated by all the source nodes can be given as
a matrix x = (xij )k×s . We denote by xj the jth column of x,
and denote by xi the ith row of x. In other words, xj is the
vector of the symbols generated at the jth source node, and
xi is the input vector of the ith computation of the function
f . Define for x ∈ Ak×s
>
f (k) (x) = f (x1 ), f (x2 ), . . . , f (xk ) .
B. Cut Sets and Special Cases
The network defined above is used to compute a function,
where multiple inputs are generated at the source nodes and
the output of the function is demanded by the sink node.
The computation units with unbounded computing ability are
allocated at all the network nodes. However, the computing
capability of the network will be bounded by the network
transmission capability. Denote by B a finite alphabet. We
assume that each edge can transmit a symbol in B reliably
for each use.
For convenience, we denote by xJ the submatrix of x formed
by the columns indexed by J ⊂ S, and denote by xI the submatrix of x formed by the rows indexed by I ⊂ {1, 2, . . . , k}.
We equate A1×s with As in this paper.
For two positive integers n and k, a (n, k) (functioncomputing) network code over network N with target function
f is defined as follows. Let x ∈ Ak×s be the matrix formed
by symbols generated at the source nodes. The purpose of the
code is to compute f (k) (x) by transmitting at most n symbols
in A on each edge in E. Denote the symbols transmitted on
edge e by g e (x) ∈ B n . For a set of edges E ⊂ E we define
g E (x) = (g e (x)|e ∈ E)
where g e1 (x) comes before g e2 (x) whenever e1 < e2 . The
(n, k) network code contains the encoding function for each
edge e, define:
k
n
A
if u ∈ S;
Q →B ,
he :
B n → B n , otherwise.
e0 ∈Ei (tail(e))
Functions he , e ∈ E determine the symbols transmitted on the
edges. Specifically, if e is an outgoing edge of the ith source
node, then
g e (x) = he (xi );
The (n, k) network code also contains a decoding function
Y
ϕ:
Bn → Ok .
e0 ∈Ei (ρ)
Define
ψ(x) = ϕ g Ei (ρ) (x) .
If the network code computes f , i.e., ψ(x) = f (k) (x) for all
x ∈ Ak×s , we then call nk log|B| |A| an achievable computing
rate, where we multiply nk by log|B| |A| in order to normalize
the computing rate for target functions with different input
alphabets. The computing capacity of network N with respect
to a target function f is defined as
k
k
C(N , f ) = sup
log|B| |A| log|B| |A| is achievable .
n
n
For two nodes u and v in V, denote the relation u → v
if there exists a directed path from u to v in G. If there is
no directed path from u to v, we say u is separated from v.
Given a set of edges C ⊆ E, IC is defined to be the set of
source nodes which are separated from the sink node ρ if C
is deleted from E. Set C is called a cut set if IC 6= ∅, and
the family of all cut sets in network N is denoted as Λ(N ).
Additionally, we define the set KC as
KC = {i ∈ S|∃v, t ∈ V, i → v, (v, t) ∈ C} .
It is reasonable to assume that u → ρ for all u ∈ V. Then
one can easily see that KC is the set of source nodes from
which there exists a path to the sink node through C. Define
JC = KC \IC .
The problem also becomes simple when s = 1.
Proposition 1. For a network N with a single source node
and any target function f : A → O,
C(N , f ) = min
C∈Λ(N )
|C|
,
log|A| |f [A]|
where f [A] is the image of f on O.
C. Upper Bounds
In this paper, we are interested in the general upper bound
on C(N , f ). The first upper bound is induced by Proposition 1
Proposition 2. For a network N with target function f ,
C(N , f ) ≤
min
C∈Λ(N ):IC =S
|C|
.
log|A| |f [As ]|
Proof: Build a network N 0 by joining all the source nodes
of N into a single “super” source node. Since a code for
network N can be naturally converted to a code for network
N 0 (where the super source node performs the operations of
all the source nodes in N ), we have
1
2
e4
C(N , f ) ≤ C(N 0 , f ).
e2
The above upper bound only uses the image of function
f . We propose an enhanced upper bound by investigating an
equivalence relation on the input vectors of f . We will compare
this equivalence relation with similar definitions proposed in
[2], [3] in the next section.
Definition 1 (Equivalence Class). For any function f : As →
O, any two disjoint index sets I, J ⊆ S, and any a, b ∈
e6
While Proposition 2 induces that
|C|
C∈Λ(N1 ):IC =S log|A| |f [As ]|
=
min
|C|
C (N1 , f ) ≤
(c)
xS\(I∪J) = yS\(I∪J) . Two vectors a and b satisfying a ≡ b|I,J
are said to be (I, J, c)-equivalent. When J = ∅ in the above
definition, we use the convention that c is an empty matrix.
(c)
For every f , I, J and c ∈ A|J| , let WI,J,f denote the
(c)
total number of equivalence class induced by ≡ |I,J . Given a
(c)
network N and a cut set C, let WC,f = maxc∈A|JC | WIC ,JC ,f .
Our main result is stated as following. The proof of the
theorem is presented in Section IV).
Theorem 3. If N is a network and f is a target function, then
|C|
C(N , f ) ≤ min
:= min-cut(N , f ).
C∈Λ(N ) log|A| WC,f
D ISCUSSION OF U PPER B OUND
In this section, we first give an example to illustrate the
upper bound. We compare our result with the existing ones,
and proceed by a discussion about the tightness of the bound.
A. An Illustration of the Bound
First we give an example to illustrate our result. Consider the network N1 in Fig. 1 with the object function
f (x1 , x2 , x3 ) = x1 x2 + x3 , where A = B = O = {0, 1}.
Let us first compare the upper bounds in Theorem 3 and
Proposition 2. Let C0 = {e6 , e7 }. Here we have
•
|C0 | = 2, IC0 = {3}, JC0 = {1, 2} ; and
For any given inputs of nodes 1 and 2, different inputs
from node 3 generate different outputs of f . Therefore
(c)
WIC ,JC ,f = 2 for any c ∈ A2 and hence WC0 ,f = 2.
0
0
By Theorem 3, we have
|C0 |
= 2.
C(N1 , f ) ≤ min-cut(N1 , f ) ≤
log|A| WC0 ,f
ρ
Fig. 1. Network N1 has three source nodes, 1, 2 and 3, and one sink node
ρ that computes the nonlinear function f (x1 , x2 , x3 ) = x1 x2 + x3 , where
A = B = O = {0, 1}.
A|I| , c ∈ A|J| , we say a ≡ b|I,J if for every x, y ∈ A1×s , we
have f (x) = f (y) whenever xI = a, yI = b, xJ = yJ = c and
Note that the definition of equivalence does not require a
previously given network. However, it will soon be clear that
with a network, the division of equivalence classes naturally
leads to an upper bound of the network function-computing
capacity based on cut sets.
e7
v
(c)
•
e3
e5
e1
The proof is completed by applying Proposition 1 on N 0 and
Λ(N 0 ) = {C ∈ Λ(N ) : IC = S}.
III.
3
min
C∈Λ(N1 ):IC =S
= 4,
where the first equality follows from f [As ] = {0, 1}, and the
second equality follows from
min
C∈Λ(N1 ):IC =S
|C| = | {e4 , e5 , e6 , e7 } | = 4.
Therefore, Theorem 3 gives a strictly better upper bound than
Proposition 2.
The upper bound in Theorem 3 is actually tight. We claim
that there exists a (1, 2) network code that computes f in N1 .
Consider an input matrix x = (xij )2×3 . Node i sends x1i to
node v and sends x2i to node ρ for i = 1, 2, 3 respectively,
i.e., for i = 1, 2, 3
g ei = x1i ,
g ei+3 = x2i .
Node v then computes f (x1 ) = x11 x12 + x13 and sends it
to node ρ via edge e7 . Node ρ receives f (x1 ) from e7 and
computes f (x2 ) = x21 x22 + x23 using the symbols received
from edges e4 , e5 and e6 .
B. Comparison with Previous Works
Upper bounds on the computing capacity have been studied
in [2], [3] based on a special case of the equivalence class
defined in Definition 1. However, we will demonstrate that the
bounds therein do not hold for the example we studied in the
last subsection.
In Definition 1, when J = ∅, we will say a ≡ b|I , or a and
b are I-equivalent. That is a ≡ b|I if for every x, y ∈ A1×s
with xI = a, yI = b and xS\I = yS\I , we have f (x) = f (y).
For target function f and I ⊂ S, denote by RI,f the total
number of equivalence classes induced by ≡ |I . For a cut
C ∈ Λ(N ), let RC,f = RIC ,f . Then we have the following
lemma:
Lemma 1. Fix network N and function f . Then, i) for any
C ∈ Λ(N ), we have RC,f ≥ WC,f ; ii) for any C, C 0 ∈ Λ(N )
with C 0 ⊂ C and IC 0 = IC , we have WC 0 ,f ≥ WC,f .
Define
|C|
.
C∈Λ(N ) log|A| RC,f
min-cutA (N , f ) = min
By Lemma 1, we have min-cut(N , f ) ≥ min-cutA (N , f ). It
is claimed in [2, Theorem II.1] that min-cutA (N , f ) is an
upper bound on C(N , f ). We find, however, min-cutA (N , f )
is not universally an upper bound for the computing capacity.
Consider the example in Fig. 1. For cut set C1 = {e4 , e6 , e7 },
we have IC1 = {1, 3}. On the other hand, it can be proved
that RC1 ,f = 4 since i) f is an affine function of x2 given
that x1 and x3 are fixed, and ii) it takes 2 bits to represent this
affine function over the binary field. Hence
|C1 |
3
min-cutA (N1 , f ) ≤
= < 2 = C(N1 , f ).
log|A| RC1 ,f
2
For a network N as defined in Section II-A, we say a
subset of nodes U ⊂ V is a cut if |U ∩ S| > 0 and ρ ∈
/ U . For
a cut U , denote by E(U ) the cut set determined by U , i.e.,
E(U ) = {e ∈ E : tail(e) ∈ V, head(e) ∈ V \ U }.
Let
Define
Λ∗ (N ) = {E(U ) : U is a cut in N }.
min-cutK (N , f ) =
min
C∈Λ∗ (N )
|C|
.
log|A| RC,f
Since Λ∗ (N ) ⊂ Λ(N ), min-cutK (N , f ) ≥ min-cutA (N , f ).
It is implied by [3, Lemma 3] that min-cutK (N , f ) is an
upper bound on C(N , f ). However, min-cutK (N , f ) is also
not universally an upper bound for the computing capacity.
Consider the example in Fig. 1. For the cut U1 = {1, 3, v},
the corresponding cut set E(U1 ) = C1 = {e4 , e6 , e7 }. Hence,
min-cutK (N1 , f ) ≤
|C1 |
3
= < 2 = C(N1 , f ).
log|A| RC1 ,f
2
C. Tightness
The upper bound in Theorem 3 is tight when the network
is a multi-edge tree.
Theorem 4. If G is a multi-edge tree, for network N =
(G, S, ρ) and any target function f ,
C(N , f ) = min-cut(N , f ).
The upper bound in Theorem 3 is not tight for certain
cases. Consider the network N2 in Fig. 2(a) provided in [2].
Note that in N2 , source nodes 1 and 2 have incoming edges.
To match our model described in Section II-A, we can modify
N2 to N20 shown in Fig. 2(b), where the number of edges from
node i to node i0 is infinity, i = 1, 2. Every network code in
N20 naturally induces a network code in N2 and vise versa.
Hence, we have
C(N2 , f ) = C(N20 , f ).
We then evaluate min-cut(N20 , f ). Note that
|C|
<∞
log|A| WC,f
holds only if |C| < ∞, and we can thus consider only the
finite cut sets. For a finite cut set C, we denote by C 0 =
C ∩ {e1 , . . . , e4 }. We have |C 0 | ≤ |C| and JC 0 ⊆ JC , and we
claim IC 0 = IC . Note that IC 0 ⊆ IC . Suppose that there exists
3
3
e1
1
e3
1
e4
2
e3
10
2
e2
e1
20
e2
e4
ρ
ρ
(a) Network N2
(b) Network N20
Fig. 2. Networks N2 and N20 have three binary sources, {1, 2, 3} and
one sink ρ that computes the arithmetic sum of the source messages, where
A = B = {0, 1}. In N20 , the number of edges from node i to node i0 is
infinity, i = 1, 2.
i ∈ IC \ IC 0 , then there exists a path from i to ρ which is
disjoint with C 0 , but shares a subset D of edges with C. Then
D ⊂ Eo (i) and hence |D| = 1. We simply replace the edge
in D by an arbitrary edge in Eo (i) \ C and form a new path
from i to ρ. This is always possible, since C ∩ Eo (i) is finite
while Eo (i) is not. The newly formed path is disjoint with C,
and then we have i ∈
/ IC , a contradiction.
According to Lemma 1, we have WC 0 ,f ≥ WC,f and hence
|C|
|C 0 |
log|A| WC 0 ,f ≤ log|A| WC,f . Therefore we can consider only cut
sets C 0 ⊆ {e1 , e2 , e3 , e4 }. We then have min-cut(N20 , f ) = 1,
where the minimum is obtained by the cut set {e2 , e4 }. While
for network N2 , it has been proved in [2] that C(N2 , f ) =
log6 4 < 1. Hence min-cut(N20 , f ) = 1 > C(N20 , f ).
IV.
P ROOF OF M AIN T HEOREM
To prove Theorem (3), we first give the definition of Fextension and two lemmas.
Definition 2. [F-Extension] Given a network N and a cut set
C ∈ Λ(N ), define D(C) ⊆ E as
[
D(C) =
Eo (i).
i∈I
/ C
Then the F-extension of C is defined as
F (C) = C ∪ D(C).
Lemma 2. For every cut set C, F (C) is a global cut set, i.e.
∀C ∈ Λ(N ), IF (C) = S.
Proof: Clearly, IC ⊆ IF (C) , then it suffices to show that
for all i ∈
/ IC , we have i ∈ IF (C) . This is true, since Eo (i) ⊆
F (C) and i ∈ IEo (i) imply i ∈ IF (C) .
Lemma 3. Consider a (n, k) network code in N = (G, S, ρ).
For any global cut set C, ψ(x) is a function of g C (x), i.e.,
ψ(x) = ψ C (g C (x)) for certain function ψ C .
Proof: For a global cut set C of N . Let GC be the
subgraph of G formed by the (largest) connected component
of G including ρ after removing C from E. Let SC be the set
of nodes in GC that do not have incoming edges. Since GC is
also a DAG, SC is not empty. For each node u ∈ SC , we have
i) u is not a source node in N since otherwise C would not
be a global cut set, and ii) all the incoming edges of u in G
are in C since otherwise GC can be larger. For each node u in
GC but not in SC , the incoming edges of u are either in GC
or in C, since otherwise the cut set C would not be global. If
we can show that for any edge e in GC , g e (x) is a function
of g C (x), then ψ(x) = ϕ(Ei (ρ)) is a function of g C (x).
Suppose that GC has K nodes. Consider the topological
order on the set of nodes in GC , and number these nodes as
u1 < . . . < uK , where uK = ρ. Denote by Eo (u|GC ) the set
of outgoing edges of u in GC . We claim that g Eo (ui |GC ) (x) is
a function of g C (x) for i = 1, . . . , K, which implies that for
any edge e in GC , g e (x) is a function of g C (x). We prove this
inductively. First g Eo (u1 |GC ) (x) is a function of g C (x) since
u1 ∈ SC and hence all the incoming edges of u1 in G are in
C. Assume that the claim holds for the first k nodes in GC ,
k ≥ 1. For uk+1 , we have two cases: If uk+1 ∈ SC , the claim
holds since all the incoming edges of uk+1 in G are in C. If
uk+1 ∈
/ SC , we know that Ei (uk+1 ) ⊂ ∪ki=1 Eo (ui |GC ) ∪ C.
By the induction hypothesis, we have that Eo (uk+1 |GC ) is a
function of g C (x). The proof is completed.
In the following proof of Theorem 3, it will be handy to
extend the equivalence relation for a block of function inputs.
For disjoint sets I, J ∈ S and c ∈ A1×|J| we say a, b ∈
Ak×|I| are (I, J, c)-equivalent if for any x, y ∈ Ak×s with
>
xI = a, yI = b, xJ = yJ = c> , c> , . . . , c>
and xS\I∪J =
yS\I∪J , we have f (k) (x) = f (k) (y). Then for the set Ak×|I| ,
the numberof equivalence
classes induced by the equivalence
k
(c)
relation is WI,J,f
with
.
∗
a and
b are not (IC ∗ , JC ∗ , c∗ )-equivalent and ii) g C (x) =
C∗
g (y) for any x, y ∈ Ak×s with

 xIC ∗ = a, yIC ∗ = b,
xJ ∗ = yJC ∗ = (c∗> , c∗> , . . . , c∗> )> ,
(5)
 x C
S\KC ∗ = yS\KC ∗ .
Fix x, y ∈ Ak×s satisfying (5) and f (k) (x) 6= f (k) (y). The
existence of such x and y is due to i). Since C ∗ and D(C ∗ )
are disjoint (see Definition 2) and for any i ∈
/ I C ∗ , xi = yi ,
together with ii), we have
∗
Thus, applying Lemma 3 we have ψ(x) = ψ(y). Therefore,
the code cannot computes both f (k) (x) and f (k) (y) correctly.
The proof is completed.
V.
(1)
We show that this code cannot compute f (x) correctly for all
x ∈ Ak×s . Denote
|C|
C ∗ = arg min
C∈Λ(N ) log|A| WC,f
(2)
and
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
(c)
c∗ = arg max WIC ∗ ,JC ∗ ,f .
c∈A|JC ∗ |
(3)
By (1)-(3), we have
k
|C ∗ |
log|B| |A| >
,
(c∗ )
n
log|A| WIC ∗ ,JC ∗ ,f
which leads to
|B||C
∗
∗
|n
k
(c∗ )
< WC ∗ ,f .
(4)
Note that g C (x) only depends on xKC ∗ . By (4) and the
pigeonhole principle, there exist a, b ∈ Ak×|IC ∗ | such that i)
C ONCLUDING R EMARKS
We propose a new definition of equivalence classes associated with the inputs of a function, which enable us to obtain
a general upper bound on the network function computing
capacity. We show that the upper bound is tight when the
network is a multi-edge tree.
Proof of Theorem 3: Suppose that we have a (n, k) code
k
log|B| |A| > min-cut(N , f ).
n
∗
g F (C ) (x) = g F (C ) (y).
[10]
A. Giridhar and P. Kumar, “Computing and communicating functions
over sensor networks,” Selected Areas in Communications, IEEE Journal on, vol. 23, no. 4, pp. 755–764, April 2005.
R. Appuswamy, M. Franceschetti, N. Karamchandani, and K. Zeger,
“Network coding for computing: Cut-set bounds,” Information Theory,
IEEE Transactions on, vol. 57, no. 2, pp. 1015–1030, Feb 2011.
H. Kowshik and P. Kumar, “Optimal function computation in directed
and undirected graphs,” Information Theory, IEEE Transactions on,
vol. 58, no. 6, pp. 3407–3418, June 2012.
A. Ramamoorthy and M. Langberg, “Communicating the sum of
sources over a network,” Selected Areas in Communications, IEEE
Journal on, vol. 31, no. 4, pp. 655–665, April 2013.
R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network
information flow,” IEEE Trans. Inform. Theory, vol. 46, no. 4, pp. 1204–
1216, Jul. 2000.
S.-Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE
Trans. Inform. Theory, vol. 49, no. 2, pp. 371–381, Feb. 2003.
R. Koetter and M. Medard, “An algebraic approach to network coding,”
IEEE/ACM Trans. Networking, vol. 11, no. 5, pp. 782–795, Oct. 2003.
R. Appuswamy and M. Franceschetti, “Computing linear functions by
linear coding over networks,” Information Theory, IEEE Transactions
on, vol. 60, no. 1, pp. 422–431, Jan 2014.
B. Rai and B. Dey, “On network coding for sum-networks,” Information
Theory, IEEE Transactions on, vol. 58, no. 1, pp. 50–63, Jan 2012.
C. Huang, Z. Tan, and S. Yang, “Upper bound on function computation
in directed acyclic networks,” 2014, submitted to ITW ’15. [Online].
Available: http://iiis.tsinghua.edu.cn/∼shenghao/pub/huang14.pdf
J Comb Optim
DOI 10.1007/s10878-014-9725-1
On the computational complexity of bridgecard
Zihan Tan
© Springer Science+Business Media New York 2014
Abstract Bridgecard is a classical trick-taking game utilizing a standard 52-card deck,
in which four players in two competing partnerships attempt to “win” each round,
i.e. trick. Existing theories and analysis have already attempted to show correlations
between system designs and other technical issues with parts of the game, specifically
the “Bidding” phase, but this paper will be the first to attempt to initiate a theoretical
study on this game by formulating it into an optimization problem. This paper will
provide both an analysis of the computational complexity of the problem, and propose
exact, as well as, approximation algorithms.
Keyword
Bridgecard · Computational complexity · Approximation algorithm
1 Introduction
Bridgecard is a game in which the two partners progress through two main phases:
Bidding and Playing. In the Bidding phase, both partnership work jointly in order to
secure a ‘contract’, that is, to determine a specified goal in the number of tricks in the
declared denomination. Once the Bidding is completed, the game enters the Playing
phase.
In this phase, one player from the partnership which established the contract, known
as the ‘declarer’, throws down a card from his hand. His partner, known as the ‘dummy’,
attempts to support him, and the other partnership, known as the ‘defenders’, must
deter them from completing the contract. In each round, each player throws down one
card. The player with the highest denomination card wins the round, i.e. takes the
trick. Should the declarer and dummy partnership fulfill their contract, i.e. wins the
number of tricks agreed upon, they win this game.
Z. Tan (B)
Tsinghua University, Beijing, China
e-mail: blueleaves09@126.com
123
J Comb Optim
The intricacies of this game lie in the fact that the players cannot coordinate with
their partners even during the Bidding phase. Each must make his move based on
different partial information while still attempting to fulfill their individual roles.
1.1 Relevant results
Previous research about bridgecard lies mainly in technical part, such as how to design
a reasonable bidding system and how to use signals in defending correctly. The book
“Killing Defence” (Kelsey 1994) developed quantities of techniques in defending
against a contract, while another book “the Expert Game” (Reese 1994) introduced
all kinds of methods for a declarer.
Analysis on strategy and computational complexity of similar games are also
appealing to computer scientists and mathematicians. Such research includes Chess,
Poker, Minesweeping, and so on. It seems that those research forms a new direction
of theoretical development. Demaine has written a book “Algorithmic Combinatorial
Game Theory” [3] on these problems, making analysis on games more relative with
theoretical computer science.
Another famous card game UNO was investigated in mathematical and computer
aspects by Demaine (2010). He formulated the game UNO into mathematical forms
and analyzed it in different versions. A single-player version was proved to be NP-hard
and the uncooperative version for 2-player version was PSPACE-complete.
Similar research has been done on different games including Tetris, Amazon, Othello and other games, which also helped establishing a new direction in theoretical
research which stems from classical game theory, but included techniques in algorithm
and theory of computation.
1.2 Motivation
Previous research in technical aspect considered a real bridgecard game where there
were a total of 52 cards, which was a severe restriction on understanding the complexity
of the game. Bridgecard seems to be more difficult than other card games, however this
is not clearly proved in mathematical and computer science aspects. Thus, a proper
generalization of this game is necessary. Only if we drop the limit on number of cards
and make it a parameter of the game, can we understand the complexity of bridgecard
game accurately. On the other hand, popularity of this game and the fact that some
other similar card games have been well studied theoretically make the bridgecard
game deserve further explanation and analysis from theoretical perspective, which
motivates our study. In this paper we investigate bridgecard game by formulating it into
optimization problems, giving exact algorithms as well as approximation algorithms
to them and analyzing the computational complexity of them.
2 Preliminaries
Games are usually formulated into different versions of mathematical problems with
respect to different settings of it. Perfect information or imperfect information? Single-
123
J Comb Optim
player or multi-player? In a bridgecard game we have 4 players and 4 suits of cards
(they are spade, heart, diamond and club). The number of players and the number of
suits are the features of bridgecard. For simplicity we consider the case that game is
played with only one suit of cards. 2-player case and 4-player case will be discussed
respectively.
2.1 Game setting
In a real bridgecard game, Players North(N), South(S), East(E), West(W) are dealt
equal number (13) of cards respectively, where each card has its color and number.
The game proceeds clockwise. North and South are partners, and East and West are
partners.
The following auction chooses one of the four players to be the declarer. Every bid
in auction specifies one number (representing how many tricks he aims to get) and a
trump suit (normally representing the longest suit in his hand). The pair that sets the
contract (i.e. makes the last bid in auction) will try to win at least certain number of
tricks, with the specified suit as trumps.
Every bid actually represents the number of tricks in excess of six which the partnership undertakes to win. For example a bid of “two hearts” represents a contract to
win at least 8 tricks (8 = 6 + 2) with hearts as trumps.
For playing part, the game is playing in rounds. In every round each player plays a
card and the player who wins this trick will play first in the next round. The player that
is clockwise next to the declarer leads the first card in the first round. For example,
if North is the declarer, then East is going to lead, and the suit of the card that he
leads comes to be the suit of this round. Immediately after this opening lead, cards of
declarer’s partner are exposed to all rest players.
Play proceeds clockwise. Each player must (if possible) play a card of the suit of
the round. A player without cards of that suit may play other cards. We say one player
wins the trick if he plays the largest number out of four. And he is going to play first
in the following round.
We call the player who exposed his hand by another term dummy. It takes
no active part in the play of the hand. Whenever it is dummy’s turn to play, the
declarer will play one of his cards. Dummy is not permitted to offer any advice or
comment.
2.2 Definition and models
First, we begin by formalizing an important problem in bridgecard into an optimization
problem: Suit play problem. A single-suit play problem is both essential and difficult,
because it asks player to design an order of playing the cards in one suit so as to
maximize the probability of taking certain number of tricks.
In terms of bridgecard, a general single-suit play may be of the following form:
123
J Comb Optim
Does there exist a strategy such we can get at least 4 tricks by using it?
While in a real game there are 4 suits, for simplicity’s sake, as it does not reduce
the difficulty of the problem, our formulation only considers the case when there is
one suit.
2.2.1 Formulation into two general games
Game 1 We have 4 players as North, East, South, and West (replaced by N,E,S,W for
short). 4n different cards labeled by 1, 2, · · · 4n respectively are dealt to them until
every one of them has n cards. One player is set to lead the first round. Play follows
the rules below.
(1) Every round has a leader to play first, then each player plays one of his cards from
the beginner in clockwise direction. A card that has been played is not allowed to
be taken back.
(2) In one round, the player who plays the largest number out of 4 wins the trick (wins
in this round).
(3) In the next round, the player who won the previous trick plays first.
Goal of the game: N and S want to maximize the number of tricks won by either N or
S, while E and W want to maximize the number of tricks won by either E or W.
This formulation kind of loses the generality for a single-suit play problem. Because
in real bridgecard, every player holds 13 cards in total but they do not hold same number
of cards in one suit. So another version of 2-suit case is given below. However, this is
only for mentioning the generality that these small changes of setting are able to give
us.
Game 2 We have 4 players as N, E, S, W. There are n different cards in “trump” suit
(red) labeled by different integers 1 to n respectively and 3n different cards in another
suit (black) labeled by different integers 1 to 3n respectively. 4n cards are dealt to
players with every one holding n of them. Play follows the rules below.
(1) In every round each player plays one card from the beginner and in clockwise
direction. A card that has been played is not allowed to be taken back.
123
J Comb Optim
(2) In one round, players must play cards in the suit of beginner’s card unless he does
not have one.
(3) In one round, if 4 cards played are in the same suit, then the player who plays
the largest number out of 4 wins the trick (wins in this round), if 4 cards are in
different suits, then the player who plays the largest number in “trump” suit wins
the trick (wins in the round).
(4) In the next round, the player who won the previous trick plays first.
Goal of the game: N and S want to maximize the number of tricks won by either N or
S, while E and W want to maximize the number of tricks won by either E or W.
Note that there are 13 cards in every suit in bridgecard, and we generalize it into
a variable n, because we are going to formulate it into an optimization problem and
analyze it from computational complexity perspective. Thus, the following asymptotic
arguments can be made when n grows: Is it a polynomial of n or something larger?
This can not be done in a real bridgecard game, because the number 13 is fixed and
all computation is of constant size (though the constant is really large but it will not
grow, thus we cannot see its exact form of complexity). Consequently, a variable n
rather than 13 better represents the generality of the game. Therefore, our formulation
is an extended version of a real bridgecard game.
2.2.2 Optimization problem of game 1
In the following discussion, we use the term “distribution of cards” to represent full
information of cards in each player, namely what cards each player holds.
On one hand, if considered purely from full information perspective, when the
distribution of cards is given to all players (i.e. each player knows his cards as well
as those of others), a natural question is to ask how many tricks N and S can win if
all the four players are playing in the best strategies. In formal matches, such computation is done with computers to help players and coaches to better analyze their
performance.
Problem 1 Input: The distribution of cards φ and an integer k (the target number of
tricks)
Output: Whether there is a strategy (actually, an algorithm) for N and S to get at
least k tricks under all possible defensive plays from E and W . If the answer is yes,
output the strategy.
On the other hand, the case in a real game is partial information. The cards are
sequentially played. Before the first card is played, every player can only see cards in
his own hand. When the first card is already played by W, N will lay down his hands
to all three players and follow the instruction from S. Therefore, for declarer S, he can
only see two hands and make his plan for playing. So here comes another question.
What is the “best” play N can do when having only partial information?
Because N cannot see E and W’s hand, there is a lot of information N does not know.
What he can do is trying to make his plan better in order to increase the probability of
getting enough tricks. Following definitions are introduced to make it clear.
123
J Comb Optim
Definition 1 (Distribution) A distribution φ of cards is a set of four hands N , E, S, W ,
with a hand to be a set of n cards, denoted by
φ = {N , E, S, W }
and N , E, S, and W are sets of n distinct numbers, representing cards held by N , E,
S, and W respectively.
Definition 2 (Whole Play) For a given distribution φ = {N , E, S, W }, we call a
specific whole process of game by a “whole play”, meaning the sequence of cards
playing under the rules in Game 1, denoted by a family W Pφ of 4-element sets:
W Pφ = {R1 , R2 , · · · Rn }
Ri = {n i , ei , si , wi }
where n i , ei , si and wi represents the card played by N, E, S, W respectively in ith
round.
Definition 3 (Partial Distribution) For a given distribution φ = {N , E, S, W } of a
game, after several rounds of play, the distribution changes into φ = {N , E , S , W },
which is called a “partial distribution”, where N , E , S , W are subsets of N , E, S, W
respectively. And the game corresponding to this partial distribution is called a “partial
game”.
We call a specific whole process of the partial game by a “partial play”, meaning
the sequence of cards played, starting from the partial distribution φ , denoted by
P Pφ = {Ri , · · · Rn } where R j is defined similarly as before and n − i + 1 is the
number of cards held by every player.
Definition 4 (Trick Function) Given a partial play P Pφ , it is easy to verify which
player wins the jth trick just by looking at R j . We then use T (P Pφ ) to represent the
number of tricks N or S gets in partial play P Pφ .
Definition 5 (Maximal Trick Function) Given distribution of a partial game φ , N and
S want to maximize the number of tricks they get, while E and W want to maximize
the number of tricks they get. Thus, different strategies are used by both sides. For
every strategy α of N and S (in fact a strategy α represents a decision tree, which will
be discussed later), no matter what strategy E and W use (let β represent this strategy),
they can always get certain number of tricks. In fact, the number of tricks that N and
S can get is determined by two strategies α and β, denoted by Tφ (α, β). We name it
by “guaranteed trick” of strategy α, denoted by GT (φ , α).
GT (φ , α) = min Tφ (α, β)
β
Consider all strategies of N and S and find the strategy with maximal GT (φ , α).
This value is called maximal guaranteed trick of φ for N and S, denoted by M T (φ ),
123
J Comb Optim
meaning that the best N and S can do is to guarantee M T (φ ) tricks, under the condition
that E and W can always find the most tough play against N and S’s strategy.
M T (φ) = max GT (φ, α)
α
Among series of definitions, maximal trick function is most important.To better
understand the definition, a “recurrence” version of definition is introduced. Consider
partial distribution φ = {N , E, S, W } and let n, e, s, w represent the cards played by
N, E, S, W in the first round respectively, and let N = N − {n}, and E , S and W similarly, and φ = N , E , S , W supposing the first card is played by N.
M T (φ) = max min max min M T (φ )
n∈N e∈E s∈S w∈W
Our first main problem is to find the value of the M T (φ). A decision version of
this question is: Given φ and an positive integer k, is M T (φ) ≥ k?
It is necessary to make another definition after M T function: the card that N plays
following strategy to guarantee M T (φi ) tricks is called “best choice” for the partial
game. Note that a “best choice” for φi can be more than one card, because N may face
the case that any card among certain cards can give him M T (φi ) tricks.
Our second problem asks what is the best N and S can do when they are given only
partial information. As we only plan to analyze the first problem into details, we give
descriptions rather than formal definition for the second problem.
Now N can only see 2 hands (N’s and S’s). For him, any distribution for E and
W is of same probability. The best he can do is to choose a strategy that maximizes
the expected number of tricks he can guarantee, or a strategy that maximizes the
probability to guarantee k tricks for a given number k.
Definition 6 (Probability for MT function) Seeing only two hands, the probability to
guarantee k tricks is defined to be
Prφ [M T (φ) ≥ k]
where φ is uniformly random chosen from all φ with respect to the visible N’s and S’s
hands.
Problem 2 is more similar to a real game, because the declarer can only see two
hands instead of four hands. However, in this case what we are more concerned is the
strategy that declarer chooses, namely the card he is going to play given only partial
information. We roughly define the problem as follows.
Problem 2 When declarer can only see his hand and the dummy’s hand,
(1) Which card should he play in order to maximize the probability to get k tricks for
a given integer k ?
(2) Which card should he play in order to maximize the expected number of tricks
that he can get?
123
J Comb Optim
In fact problem 2 is asking for a “best probabilistic play”. It is necessary to point out
that in problem 2, the Guaranteed Trick is not that important, nor is Maximal Trick,
because we are completely drawn into a probablistic setting, and such formula with
max and min notions are not appealing here.
3 Analysis for optimization problem
Two problems are tractable, but that will take large amount of time. A deeper understanding from following aspects are needed:
• Dynamic programming perspective (DP)
• Intuition and conjecture of their difficulty
• Special properties of them.
The first aspect gives us an algorithm for solving the problem but takes exponentially
large amount of time. The second aspect analyzes its computational complexity. The
third aspect helps us better understand the optimization problems and gives us intuition
on approximation algorithm design.
3.1 Computing the MT function
3.1.1 Exact algorithm (Dynamic programming) for computing MT function
Recall that M T (φ) = M T ({N , E, S, W }) is the maximal guaranteed number of
tricks of partial game φ. N , E, S, W represent the sets of cards of player N, E, S, W
respectively. It can be seen that when we have computed the M T (φ), the best card
to play first for N can also be found out. We call this card nin N by “best choice” of
partial game φ.
For convenience in writing, following definitions are introduced.
Definition 7 (Trick Judging Function) Let t (Ri ) = t (n i , ei , si , wi ) be the trickjudging function for some round i, where n i , ei , si , wi are the cards played by N,
E, S, and W respectively. t (Ri ) = 1 if N/S gets this trick, and t (Ri ) = 0 if E/W gets
this trick.
Definition 8 (Leader Judging Function) The person who won the last trick is going
to lead the next trick. Let L i = L(Ri−1 ) represents the first player of round Ri . It is
determined by Ri−1 and its range is {N , E, S, W }.
Let N , E , S , W represents the left card-sets of N, E, S and W respectively after
one round. Following recurrence equation should be satisfied.
M T ({N , E, S, W }) = max min max min (M T ({N , E , S , W } + t (n, e, s, w))
n∈N e∈E s∈S w∈W
123
J Comb Optim
where
n = argmaxn∈N {min max min (M T ({N , E , S , W } + t (n, e, s, w))}
e∈E s∈S w∈W
N = N − {n}
and similar definitions for e, s, w and E , S , W .
Using dynamic programming techniques, we compute all cases for
M T (φ )(conn
( ni )4 =
sidering different partial games after different number of rounds). i=1
O(24n n1 ) cases are computed (estimated by Stirling formula). For every case, as the
expression of min and max notions above, O(n 4 ) steps are in needed. Total amount
of time is O(16n n 3 ) in this algorithm.
3.1.2 Properties for the optimization problem
We make some definitions, claims and show proofs of them, which can help us better
understand the properties of the game.
Definition 9 (Complete Better) A hand A = {a1 < a2 < · · · < an } is “completely
better” than another hand B = {b1 < b2 < · · · < bn } if ai ≥ bi stands for every
i ∈ {1, 2, · · · , n}, and similar definition of “completely worse” is made.
If a player wants to increase the number of tricks he got in a 2-player game, he
would like to have a complete better hand than his current hand. For a 4−player case,
we have the following claim 3 to specify the complete better property, i.e. a completely
better partial game is obtained by exchanging some cards between two hands, but not
by simply comparing numbers between hands, for the latter is not well defined in
4−player case.
We extend our definition for maximum trick function here. The number of tricks
that N can guaranteed depend on who leads the first round. N may benefit if he is
the last to play in the first round because he can decide after E and W have played.
Thus, we use notion φ N to represent a partial game started by N . And let M T (φ N )
be the maximal guaranteed tricks for N in a partial game φ started by N . Similarly
for notions φ E , φ S , and φW , they represent the maximal number of tricks that N can
guarantee to get for a partial game φ started by E, S, W respectively.
Claim 1 (1) M T (φ E ) ≥ M T (φ N )
(2) M T (φW ) ≥ M T (φ N )
(3) M T (φ E ) ≥ M T (φ S )
(4) M T (φW ) ≥ M T (φ S )
Claim 2 (1) M T (φ E ) ≤ M T (φ N ) + 1
(2) M T (φW ) ≤ M T (φ N ) + 1
(3) M T (φ E ) ≤ M T (φ S ) + 1
(4) M T (φW ) ≤ M T (φ S ) + 1
Claim 3 (Completely Better Property) If N is replaced by another “completely worse”
hand N , and W is replaced by another “completely better” hand W accordingly, i.e.
123
J Comb Optim
N and W exchange a subset of their cards Ne and We such that |Ne | = |We | and We
is completely better than Ne , and no change to the rest two hands and the player to
lead the first round, the MT function does not increase:
M T ({N , E, S, W }i ) ≥ M T ({N , E, S, W }i )
where i ∈ {N , E, S, W }.
Claim 4 (Best Choice) If N is the last to play in the first round, his decision for best
choice must follow the property: If he plans to win this trick, he will play the minimum
card that is able to win; if he does not plan to win, he will play the minimum card of
his hand.
Proof We prove the four claims together on induction of n (the number of cards in
every player’s hand).
When n = 1 it is immediate that the conclusions hold.
Supposing in the case n ≤ k the conclusions hold, we consider the case n = k + 1.
For Claim 1, we just need to prove the following fact:
If W is the first to play, after one round, S has a strategy that can give him a better
remaining game (partial game) than the case that S is the first to play.
Consider the case that S is the first to play in 1st round (we call it case 1). His best
strategy could be to play s1 , and according to the card played by W , he decide which
to play in N ’s hand. To be specific, if W play w1 then he will play n 1 , if W play w2
then he will play n 2 , etc.
Now consider the case that W is to play first in 1st round (we call it case 2). We
design a strategy for S as the following. The card played by W determines what he
will play in N s hand according to the best strategy in case 1 (the strategy given in
previous paragraph), i.e. if W play w1 then he will play n 1 , if W play w2 then he will
play n 2 , etc. Then no matter what E plays, S will play s1 in his own hand.
It is immediate that this strategy will bring S to a better partial game than the partial
game in case 1.
However, this strategy is one of the strategies that S will think of in case 2, that is,
this strategy is in his consideration. According to definition of M T , the strategy that
S uses will guarantee him a better result. In other words, the number of tricks that he
can guarantee in case 2 is larger than or equal to that of case 1.
For Claim 4, we considered continuously. For a whole play W Pφ1N , with
sl , wi , n j , ek played in the first round, we consider the W Pφ2E with wi , sl , ek played
and it is N’s turn to play now.
If he managed to win this round with n j , and this time he hopes to win with n t ,
if nl > n t . Comparing this partial game ψ with φ , according to the hypothesis of
Claim 3 in n = k case, it can be found that N can guarantee more tricks in playing
n t . With symmetric analysis for the case that he plans not to win this trick, Claim 4 is
proved.
For Claim 2, it suffices to consider the case that N loses the first trick in W Pφ1N and
plans to win it with n t instead of n j . It is immediate that n t > n j . Therefore, in this
new partial game N has a completely worse hand than φ . According to the hypothesis
123
J Comb Optim
of Claim 3 he cannot guarantee more tricks. So the difference of two maximal number
of guaranteed tricks can be at most 1.
For Claim 3, we make a correspondence between 2 partial plays: {n 1 · · · n k+1 }(with
increasing order) which is completely weaker than another hand {n 1 · · · n k+1 }(with
increasing order).
If in the first round N plays n x to win this trick, then we consider the case that N
plays n x in the new case. It is immediate that n x ≥ n x and he can still win this trick.
While for the partial game φ the hand {n 1 · · · n x−1 , n x+1 , · · · n k+1 } is still completely
better than {n 1 · · · n x−1 , n x+1 , · · · n k+1 }. According to the hypothesis the Claim 3,
conclusion for the case n = k + 1 still holds.
If in the first round N plays n x but loses this trick, then we consider N play n x
in the new cas3. It is immediate that n x ≥ n x . If he still loses this trick, then for
the partial game the hand {n 1 · · · n x−1 , n x+1 , · · · n k+1 } is still completely better than
{n 1 · · · n x−1 , n x+1 , · · · n k+1 }. According to the hypothesis the Claim 3, conclusion for
the case n = k + 1 still holds. But if he wins this trick, according to Claim 2 and Claim
1 in n = k case, Claim 3 is true here.
We can see that Claim 4 is essential to understand the “best choice”. It is also an
intuitive property: when we do not plan to win this trick, then there is no need to waste
a big number, if we plan to win it, then the minimum number will do. We will see in
the following that this property also helps us to design approximation algorithms to
this optimization problem.
3.1.3 Computational complexity
Dynamic Programming algorithm solves the problem, but takes exponential time. So
what is the computational complexity of this problem? The answer is not readily
apparent but we analyzed from different aspects of the problem and then found some
evidence that it is in PSPACE-complete. We use “MT” to represent decision version
of problem 1.
First we have the following theorem:
Theorem 1 MT is in PSPACE.
Proof We prove the theorem by giving an algorithm using polynomial space. In fact the
dynamic programming(DP) is the algorithm we want, although it runs for exponential
time, but it uses polynomial space. We can regard the DP algorithm as implementing a
tree-like algorithm, for deciding every node we need polynomial much resource from
child nodes. Let deep first implementing be the order for running this DP-tree, it can
be seen that implementation for every node needs polynomial space. Thus, MT is in
PSPACE.
This theorem gives us an “upper bound” of the computational complexity of the
optimization problem. In order to get a lower bound, we consider a simpler problem
as the following:
Game 3 We have 2 players to be North and South. There are 2n different cards labeled
by 1 to 2n respectively, with every player has n of them. One player is set to lead the
first round. Plays follow the rules below.
123
J Comb Optim
(1) Every round has a beginner to play first, then each player plays one of his cards
from the beginner in clockwise direction. A card that has been played is not allowed
to be taken back.
(2) In one round, the player who plays the largest number out of 2 wins the trick (wins
in this round).
(3) In the next round, the player who won the previous trick plays first.
Goal of the game: N wants to maximize the number of tricks won by N, while S
wants to maximize the number of tricks won by S.
The only difference between Game 3 and Game 1 is the number of players, for
2 < 4 and we make the following claim. Thus, our definitions before can be inherited
here, with the only difference to be the number of players.
Claim 5 The decision version of MT problem of Game 3 can be polynomially-reduced
to the decision version of MT problem of Game 1.
Proof For a partial game for Game 3, N = {n 1 < n 2 · · · < n k } and S = {s1 ≤ s2 <
· · · < sk }. We consider a corresponding partial game for Game 1, where union of
card-sets S and E are 1 to 2n (no matter the exact distribution of these 2n cards), and
N = {n 1 + n < n 2 + n · · · < n k + n} and W = {w1 = s1 + n < w2 = s2 + n <
· · · < wk = sk + n}
It can be seen that S and E cannot win any trick. Then the partial game of Game 1
is exactly the same as the partial game of Game 3. The partial game of Game 1 is just
the competition of N and W. Then if we can solve the partial game of Game 1, we can
also solve the partial game of Game 3.
This claim also means that we can use the property (Claim 1-4) in a four-player
game to a two-player game.
But are optimizations for Game 1 and Game 3 of the same computational complexity? We tend to believe that the answer is no, because there is essential difference
between two games. In problem for Game 1, one trick is won by the following interacting determination: One of N/S should first decide what they play, then one of E/W
makes their decision, next another one of N/S makes his choice and the remaining
player of E/W chooses his card in this round. There is interaction of decision in every
round. But this is not the case in Game 3. N and S just decide once in one trick, so it
is not easy to find the polynomial-reduction from Game 1 to Game 3.
Now we consider the same optimization problem of Game 3, with distribution given
to both N and S.
Problem 3 Is there a strategy for N to guarantee at least k tricks (under all possible
defensive plays by S)?
When k is a fixed constant, the problem is in P. Let M T − k represent the problem
in the case that k is a constant but not part of input. Our conclusion is the following
theorem.
Theorem 2 MT-k is in P.
123
J Comb Optim
Proof We prove by induction on k, for every k, we give a recurrence algorithm for
solving problem M T − k. When k = 1.we need to consider two case: the leader of
first round is N or S.
If it is S, then from Claim 4 we know that the problem can be solved as the following.
Whatever S plays in his hand in every round, if some time N is able to win the current
trick, then he just wins it, if N cannot, he just plays the smallest number in his hand.
Thus, it suffices to see whether any card of S is larger than any card of N , if he does
not, then we claim that N is able to win one trick in this game, following the algorithm
directed by Claim 4.
If it is N , he needs to see how many cards he has that is larger than n, if he does not
have any, then he is not able to win any trick; if he has more than one, he just needs
to play the largest in his hand, and be waiting for the second largest card to win one
trick. The case is a little bit complex when he has exactly one, if it is not n + 1, then
he needs to play the smallest number in his hand and wait for the largest card in his
hand to win one trick; If it is exactly n + 1 and he has n also, he needs to play n + 1
and n will be one trick for him; if it is exactly n + 1 and he does not have n, then he
cannot get any trick.
Consequently, it takes O(1) time to decide whether N is able to win one trick.
We claim that it takes O(n 2k−2 ) time to decide whether N is able to win k tricks,
no matter who is the leader of first round.
Supposing the claim stands for k − 1 cases, for the case that N needs to win k tricks,
we also divide the problem into two parts.
If the leader of the first round is S, no matter what he plays from his hand, from
Claim 4 we know that the best N can do is just 2 choices: to win it by smallest possible
card or to play the smallest card in his hand. (If he cannot win or can only win then
there is only one choice.)
For every possible leading cards from S (at most n choices), if N is able to win it,
he will computing M T (φ N , k − 1) in O(n 2k−4 ) time for the remaining partial game
φ after both N and S have played one card. Consider the case that there exists some
s ∈ S such that if S plays s in the first round and N covers it, then M T (φ N , k −
1) = 0, i.e. N cannot guarantee k − 1 tricks in the partial game, and thus he cannot
guarantee k by covering s. Such s can be not unique, and we choose the smallest
one of them according to Claim 3. In this case N should not cover s but play small.
The problem transfers into computing M T (φ S , k) where there is only n − 1 rounds
in the remaining partial game φ S . This transformation reduces the number of rounds,
and thus can happen at most n times. So, after at most n transformation, we can
reduce the target number of tricks k into k − 1. It is because each transformation takes
at most n O(n 2k−4 ) time that the optimization problem can be solved in O(n 2k−2 )
time.
If the leader of the first round is N , then he is going to think similarly as before from
S’s perspective. Is there a card n ∈ N such that when he plays n, S cannot prevent him
from getting at least k tricks in the remaining partial game? Similar transformation
and argument can solve the problem, with time limited by O(n 2k−2 ).
When k is part of input, further analysis on properties of the game is need. Several
definitions are made for this purpose.
123
J Comb Optim
Definition 10 (Consecutive Cards) Two or more cards are called consecutive cards if
they are consecutive integers.
Definition 11 (0-1 Representation) We call a 0/1 string a = a1 a2 · · · a2n with exactly
n 1s and n 0s to be the representation N’s hand, ai = 1 if and only if N holds the card
2n + 1 − i.
Definition 12 (Completely Better in Representations) We call a representation a =
a1 a2 · · · a2n is “completely better” than another representation b = b1 b2 · · · b2n if for
all i ∈ [2n], it holds that the number of 1s among first i digits in a is not less than that
of b.
Definition 13 (Complete Better in Choices) Suppose N holds a = a1 a2 · · · a2n and
he is going to lead the first round. We call a choice ai is completely better than a j if
no matter his opponent chooses to win or not to win, the remaining partial game for
playing ai is completely better than that of playing a j .
Claim 6 If the leader for current round holds no consecutive cards, then his best
choice is to play the smallest number in his hand.
Proof We prove the claim by analyzing the representation of partial game. If the current
representation is a (i) where ai = a j = 1, i < j and ai−1 = ai+1 = a j−1 = a j+1 = 0.
If N chooses to play card ai and S chooses to win it, then the remaining representation is going to be a = a1 a2 · · · ai−2 ai+1 · · · a2n ; if N chooses to play card a j and S
chooses to win it, the left representation is a = a1 a2 · · · a j−2 a j+1 · · · a2n . It is easy
to see that a is completely better than a when there is no consecutive cards.
If N choose to play card ai and S choose not to win it, the left representation a is
just deleting ai and the last 0 in origin representation a (i) . If N choose to play card a j
and S choose not to win it, the left representation a is just deleting a j and the last 0
in origin representation, and it is easy to see that a is completely better than a when
there is no consecutive cards.
From Claim 1, Claim 2 and proofs of them we obtain the fact that the better representation we get, the more tricks we can win. Thus, if he does not have consecutive
cards, his best choice is to play small.
This claim gives us the intuition that if the hand is kind of “scattered” (cards are
not consecutive), then it is easy to deal with (the best choice is to play small). If the
hand is “compact, which means that there are a lot of consecutive in a hand, the case
is still easy to deal with. We can see that among consecutive cards the choices are
the same (it is same that you choose to play 5 or to play 6 when you hold both 5 and
6), and less choice means less difficult. Therefore, the main difficulty in optimization
problem of Game 3 is the case that the hand reaches a balance between “scattered”
and “compact”. Analysis on this is the following.
We consider an instance of Game 3. If we are going to make the claim that N is
able to win at least k tricks, we are supposed to give a strategy for him to guarantee
certain amount of tricks. And the form of a strategy can be formalized into a tree as
the following:
123
J Comb Optim
Definition 14 (Strategy) We denote the strategy for N in distribution φ to get k tricks
by S(φ, k), and the size of this strategy is the number of nodes in S(φ, k).
This model is similar to a game tree, where every non-leaf node is labeled by logic
operation “and” or “or”, and every leaf node is labeled by 0 or 1. But the size of the
tree is larger than a game tree with same input n and is therefore difficult to be verified.
Now we give an example that reaches the balance of “compact” and “scattering”,
and consider the difficulty of the whole problem.
Let n = 6k (if n is not a multiple of 6 then we consider n6 and the complexity
would be almost the same), and let N’s hand N = {6k, 6k − 1, 6k − 2, 6(k − 1), 6(k −
1) − 1, 6(k − 1) − 2 · · · 6, 5, 4} and the target number of tricks is k0 .
Claim 7 The size of S(φ, k0 ) is O((cn)dn ), where c and d are constants.
Proof First, the number of layer is 6k.
Notice that for a partial game if some choice will lead to worse better remaining
partial game than another choice, then this choice will not lead a branch, i.e. it should
not appear in strategy tree.
When will the strategy tree split into many branches in some layer? The only case
for this is that N has many consecutive cards, and they will lead to different partial
games. Among these partial games, no one is completely better than another. Here we
say a choice is completely better than another, meaning that this choice will lead to a
completely better partial game than another.
We claim that there always exists some layer that S will split into k − 1 branches
in this layer, i.e. he has k − 1 choices and no one is completely better than another.
This is immediate, because if N plays the top 3 honors first, then no matter what
he plays next, S can win it and lead the next round, so for S there are k − 1 different
123
J Comb Optim
choices, and we must cover all of them in strategy tree. This is why the strategy tree
will definitely split into k − 1 branches in some layer.
It is true that if one’s hand has k sets of consecutive cards, and it is his turn to
play first in this round, then among these k choices, no one is completely better than
another. Thus, the number of child nodes is the number of sets of consecutive cards.
We claim next that after first split, every branch will split into at least k − 3 subbranches in some layer below. This can be proved by similar reasoning.
Consequently. we obtain our conclusion: The size of S(φ, k0 ) is at least (k − 1)!!,
which can be estimated using Stirling’s Formula. The size is (cn)dn .
This size is really large for a verifier to check. We consider the definition of
PC P(r (n), q(n)), it is the set of decision problems that have probabilistically checkable proofs which can be verified in polynomial time using at most r (n) random bits
and by reading at most q(n) bits of the proof. And we have the important conclusion
that when there is a lower bound for number of queries of some problem from PCP
perspective, then this problem is really likely to be in PSPACE-complete.
Conclusion 1 PCP[poly(n),O(1)]=PCP[poly(n),poly(n)]=NEXP (MIP = NEXP)
(Babai et al. 1991).
The complexity class NEXP strictly contains PSPACE.
Also, it is generally believed that for the proof under the complexity of 2cn where
c is a constant, we are probably able to use little randomness and few times of queries
to make the probability of correctness of our verify a constant. But now the size is
(cn)dn , it becomes less possible, probably impossible. And although we are not able
to reduce some PSPACE-complete problems to this bridgecard optimization problem,
considering both the difficulty of proving the equivalence between 2-player case and 4player case and the difficulty from PCP perspective to prove that the problem belongs
to a lower complexity class, it is reasonable for us to believe that this problem is
PSPACE-complete.
Conjecture 1 MT is in PSPACE-complete.
3.2 Algorithm for best probabilistic play problem
3.2.1 Exact algorithm (DP) for computing best probablistic play
First, we found some local property of the problem.
When one round is finished, there are 4(n − 1) cards left. We can do rearrangement
of those cards: For one player, he will clearly see how large is every of his card in
the left 4(n − 1) cards. To be accurate, a new 0 − 1 representation can be found by
him with respect to his remaining hand. Then he can regard the partial game as a new
whole game containing a total of 4(n − 1) cards. And this “partial game transform”
procedure can be done in O(n) time.
For finding the “best probabilistic play” for a declarer, he is always looking for
the strategy with the highest probability to get certain number of tricks according to
the increasing information he has known, while his opponents E/W can see the card
123
J Comb Optim
distribution. They watch the declarer making some choices and what they are going to
do is to use the “oracle” of computing the “best choice” for the partial game (explained
in problem 1) to decide what they play, and wait for declarer’s next choice.
Consequently, when the distribution is given to E/W but not to N, if N always finds
the best probabilistic play for partial game, then the total number of tricks cannot
be larger than the “maximum number of guaranteed tricks in the initial whole game,
because he is not always doing the best choice according to the given distribution,
while his opponents are always doing their best to prevent N from getting more tricks.
On the other hand, we can define and consider “best probabilistic defensive play”
of a defender in another setting: For a defender E, what he can see is his own hand and
the dummys hand, but the declarer can see the whole distribution and always makes
the best choice for partial game. And over all possible distributions the choice that
appears most times to be the best choice is the “best probabilistic play for a defender.
We design an algorithm using dynamic programming to solve the Best Probabilistic
Play problem. Before that, some definitions are made to formalize the setting.
Definition 15 (Partial Game with Incomplete Information) Let φ ij = {N , E, S, W }
be the partial game with incomplete information for declarer N. In this partial game
the first round is started by j ∈ {N , S}, and all the declarer can see are two hands
N = {n 1 , · · · n n } and S = {s1 , · · · sn }. He is going to make choices based on these
two hands and the sequentially revealed cards sequentially revealed by E and W.
Definition 16 (Strategy) Let α be a strategy of declarer N, it is a decision tree same
as the tree above definition 14.
Definition 17 (Best Defence) Defenders E and W can see the whole distribution of
4 hands. When E/W is going to play a card, he just runs Best Play algorithm of the
remaining partial game to find the card that he should play to minimize the number of
tricks that N can guarantee. This algorithm for defenders are called Best Defence.
Definition 18 (Probability Function for a Strategy) Let P(φ ij , α, k) be the probability
for the declarer N to win k tricks using the strategy α under the best defence from E
and W. The probability is taken over uniformly chosen distribution over the left two
hands.
Definition 19 (Probability Function for a Partial Game with Incomplete Information)
Let P(φ ij , k) = maxα P(φ ij , α, k) be the probability for the declarer N to win k tricks
in partial game, namely the largest probability to get k tricks over all strategies. And
let α(φ ij , k) denote the algorithm that provides N the largest probability to get k tricks,
which is also called the Best Strategy.
Let R denote the left 2n cards held in E and W , namely R = [4n] − N − S, and let
the trick judging function t (Ri ) and leader judging function L(Ri ) be similarly defined
in definition 7 and definition 8. The following recurrence function can be found:
P(φ iN , k) =
E,W
i
Pr [E, W ] max min max min P(φ L(R
, k − t (R1 ))
1)
n∈N e∈E s∈S w∈E
123
J Comb Optim
where the probability is taken uniformly on all possible distribution of card sets E
n!n!
, and R1 = {n, e, s, w}, and φ i represents the
and W , and Pr [E, W ] = 2n1 = (2n)!
(n)
remaining partial game after the first round.
i , k). Dynamic
Similarly, recurrence functions on P(φ Si , k), P(φ iE , k) and P(φW
n
programming algorithm is therefore set up. We need (2n)!
n!n! = O(2 ) computations in
n−1 n 2
every case and i=1 ( i ) = O(n · 22n ) cases are going to be computed. Thus, our
algorithm runs in O(n · 23n ) time.
3.2.2 Computational complexity and properties of main problems
We have not found out the computational complexity of this problem, but we did
analysis on comparing the complexity of two problems: “Best Play” (best choice) and
“Best Probabilistic Play”.
From DP perspective, we can see the fact that with exponentially many calls of an
oracle solving “best play” problem we can solve “best probabilistic play” problem,
but we do not know if we can solve “best probabilistic play” with poly-call of “best
play”. If we can, that means “best probabilistic play” is easier than “best play”.
But the definition of “best probabilistic play”, it contains the elements of definition
of “best play”, we tend to believe that “best probabilistic play” will not be easier than
“best play”. On the other hand, we know that if a declarer follows the strategy given
by “best probabilistic play”, he still cannot get more tricks than “best play” strategy,
in other words “best play” does better job than “best probabilistic play”. This fact kind
of shows us that “best play” is harder, because for most cases we tend to think that
getting a better solution to a problem need more computation.
Analysis from two aspects guides us to make the following conjecture.
Conjecture 2 Problems Best Play and Best Probabilistic Play are of same computational complexity.
Best Play ≤ p Best Probabilistic Play and Best Probabilistic Play ≤ p Best Play
This conjecture tells us that, when a declarer wants to find the best strategy to make
his contract, regardless of the mistakes from his opponents, it is as difficult as to find
the best play from given distribution. However, the latter problem tends to be done by
a computer because it is very complex. Also same case for a defender, when a defender
wants to find the best strategy to beat declarer’s contract, regardless of the mistakes
from declarer, it is as difficult as to find the best play from given distribution. And the
latter problem tends to be done by computers.
In short, it is almost impossible for a bridgecard player to make no mistakes in even
one board. And this is exactly the case even in the top matches all over the world.
On the other hand, through descriptions about “best probabilistic play” we know
that if we have an oracle to this problem, then for any partial game, when it is our turn
to play, we can use this oracle to compute the probability of winning for every option,
and we always choose the highest of them and find the corresponding strategy and
play just by following this strategy.
Another interesting fact here is that from quantities of experiments on comparing
the “best play” and the “best probabilistic play”, we find that in most cases the “best
123
J Comb Optim
probabilistic play” can give the strategy for “best play”, and the “deviation” occurs on
the following cases.
It shows that sometimes the mistakes are unavoidable, because through computing
we can gain nothing about what the best play is, because there are two completely
different strategies with exactly the same opportunity. We call cases like this to be
“undirected cases”.
With plenty of evidence, we believe in the statement that “only with more than one
choices of exactly same probability to win, does the best probabilistic play deviate the
best play”. This fact guides us to make the following conjecture, which is reasonable
and likely to be right.
Conjecture 3 Best probabilistic play will give us best play if we do right on every
“undirected case”.
Anyway, this conjecture shows that sometimes we should not be accused of those
mistakes, but are just unlucky.
3.3 Approximation algorithm for problem 1
We design an approximation algorithm for the problem “best play” in a 2-player case,
and give our proof for the approximation ratio. We always assume that the opponents
always do the best they can (they just choose the card that can best limit the number
of tricks of the declarer).
Let N to be the set of cards of North, and S for South similarly. Let o(t) be the
number of cards which is one of the largest t th cards in N , and the following notion
is essential in the proof.
Definition 20 (Potential Tricks) Let P T (φ ij ) be the number of potential tricks partial game with incomplete information φ ij , formally defined to be P T (φ ij ) =
max1≤i≤2n {2o(i) − i}.
Approximation Algorithm 1 (1) When you are the 1st to play in this round, just play
the largest number in your hand.
(2) When you are the 2nd place in this round, then just see whether you can cover the
first card played by E, if you can, then play the smallest card that covers it; if you
cannot, then play the smallest card in your hand.
Definition 21 (Upper Region) We call the range [n + 1, 2n] the upper region, when
there are a total of 2n cards left in the partial game.
123
J Comb Optim
Theorem 3 Approximation Algorithm is 4 − appr ox.
Lemma 1 If N has p cards in the upper region of origin game, then if he uses approximation algorithm he will get at least 2p tricks when it is him to play first in the first
round; and he will get at least 2p when it is his opponents turn to play first in the
first round.
Lemma 2 If N has p ( p ≤ n2 ) cards in the upper region of origin game, then if his
opponent uses Approximation Algorithm, N will get at most 2 p tricks.
Proof First we prove the lemma 1, we use the representation of the game defined
above, and consider the following cases using induction on n.
For n = 1 and n = 2 it is easy to check it, supposing the lemma is right when
n ≤ k − 1, consider the case n = k.
If your opponent plays the largest card in his hand that cannot be covered by you,
you just play the smallest number in your hand. There is no change in the number of
your cards in the upper region and we get a k − 1 case with leader to be your opponent.
If your opponent plays a card and it is covered by you. The number of your cards
in the upper region minus 1, and we really get a trick, then we get a k − 1 case with
leader to be yourself.
If it is your turn to play and your largest card is covered by opponent. The number
of your cards in the upper region minus 1, and we really get a trick, then we get a k − 1
case with leader to be your opponent.
If it is your turn to play and your largest card is not covered by opponent. The
number of your cards in the upper region minus 1, and we really get a trick, then we
get a k − 1 case with leader to be yourself.
In the 4 cases we consider them with hypothesis and to see that the lemma holds in
case n = k.
Then we prove lemma 2, since N has p cards in the upper region, his opponent has
n − p cards in the upper region. One of S’s cards does not win a trick for S if and only
if it is covered by a N’s card in upper region. Thus, S has at least n − p − p = n − 2 p
cards in the upper region that will guarantee a trick for him. Consequently, N can get
at most 2 p trick.
We come back to proof of the theorem.
On one hand, if in N’s hand, the number of cards in the upper region is larger than
n
,
2 then the theorem is right, because theoretically he can win at most n tricks which
is not larger than n, and our approximation algorithm can guarantee n4 tricks for him.
On the other hand, if in N’s hand, the number of cards in the upper region p is less
than n2 , then the theorem is right, because from lemma 2 theoretically he can win at
most 2 p tricks, and our approximation algorithm can guarantee 2p tricks for him. However, this approximation algorithm can still be improved, because only a little
bridgecard logic is added into algorithm, which is far from enough to reach the limit,
although our 4appr ox algorithm seems to be a good result. Consider the following
instance in a 4-player case.
123
J Comb Optim
This case stands for a class of techniques in real bridgecard game (n = 2), which
is called “finesse” in a real bridgecard game. Here S need to play small to 6, so if 7 is
in the W’s hand then S can get all 2 tricks, but if 7 is in E’s hand, then S can only get
1 tricks and in fact there is no way for S to get 2 tricks. To be honest, to implement
the finesse technique into an algorithm seems really difficult, maybe concerning some
knowledge on “pattern recognition”. While for a real bridgecard artificial intelligence
system, a single technique “finesse” is far not enough. There are still some cases like
we should not play honors in the first round, but our approximation algorithm does so.
(n = 5) In this case S should play 4 to 1 to guarantee 3 tricks, no matter which
opponent wins this trick by a larger card, he will be “endplayed” to lead first in the
next round. And in the next round, the declarer can play small in the second position,
and when the third player plays a number larger than 16 you just win with the honor,
if the third player plays a number smaller than 15 you can win by 15 or 16.
4 Conclusions and future development
Our research initializes and analyzes the problem of bridgecard game, including formulating the game into two general optimization problems: Best play and best probabilistic play. The 2-player version of these problems are concrete and simple, and thus
suitable to analyze the computational complexity. In the formulation we change the
number of cards from 13 in one player to an undefined variable n, which allows us to
make asymptotic argument from theoretical computer science aspect. We conjectured
that best play is in PSPACE from analysis from PCP perspective. Then we conjectured
that best probabilistic Play has the same computational complexity as best play.
Dynamic programming is employed to give algorithms for both problems. Concerning some basic skills in real bridgecard games, we then designed an approximation
algorithm and proved the constant approximation ratio.
123
J Comb Optim
However, the approximation algorithm and analysis are fundamental, and certain
flaws were pointed out in the last section. Therefore, future research on better approximation algorithms and deeper point of view on analysis are expected.
Acknowledgments First and foremost, I would like to show my deepest gratitude to our first professor,
Andrew Chichih Yao, who has provided me with good resource in learning and discussing. Without his help
I cannot get the opportunity to walk deeply into theory of computer science. I shall extend my thanks to
Prof Amy Wang for her guidance and encouragement for me to do deeper analysis on this project. I would
also like to thank all my teachers who have helped me to develop the fundamental and essential academic
competence: Prof Jian Li, Prof Iddo Tzameret and older student Hongyu Liang. Without their help I cannot
gain deeper understanding of the problem. I would also thank Donna Dong for helping me in language.
She managed to read the whole paper even if she was not familiar with the background knowledge. And
I learned a lot on standard use of English and analytical writing under her guidance. Last but not least,
I would like to thank all my friends, especially my three lovely roommates, for their encouragement and
support.
References
Kelsey (1994) Killing defence.
Reese (1994) The expert game.
Demaine ED (2001) Algorithmic combinatorial game theory.
Demaine ED (2010) The complexity of UNO. http://arxiv.org/pdf/1003.2851.pdf
Babai L, Fotnow L, Lund C (1991) Non-deterministic exponential time has two-prover interactive protocols.
Comput Complex 1(1):3–40
123
On the Meeting Time for Two Random Walks on
a Regular Graph
Yizhen Zhang† , Zihan Tan† and Bhaskar Krishnamachari‡
†
Tsinghua University, Beijing, China
‡
University of Southern California, Los Angeles, USA
{yz-zhang11, zh-tan11 }@mails.tsinghua.edu.cn, bkrishna@usc.edu
October 31, 2014
Abstract
We provide an analysis of the expected meeting time of two independent random walks on a regular graph. For 1-D circle and 2-D torus graphs, we show that
the expected meeting time can be expressed as the sum of the inverse of non-zero
eigenvalues of a suitably defined Laplacian matrix. We also conjecture based on
empirical evidence that this result holds more generally for simple random walks
on arbitrary regular graphs. Further, we show that the expected meeting time for
the 1-D circle of size N is Θ(N 2 ), and for a 2-D N × N torus it is Θ(N 2 logN).
1
Introduction
Consider a system of discrete-time random walks on a graph G(V, E) with two walkers.
Each time, they each independently move to a nearby vertex or stay still with given
probabilities. Denote the transition matrix of a single walker by P, where P(i, j) is the
probability that one walker moves from vi to v j in a time slot. This process is assumed
to start at steady state (i.e. uniform distribution) for each walker, and terminates when
they meet at the same vertex. We denote this meeting time by τ, which is a random
variable with the expectation E[τ]. Our objective is to analyze this quantity on d-regular
graphs.
Figure 1: 4 walkers on a 3-regular graph
1
It is instructive to consider the problem on the one-dimensional circle first. We
study a circle with N nodes, denoted by V = {0, 1, 2, · · · , N − 1}. The two walkers start
from arbitrary position according to the initial distribution. Every step, the walker on i
chooses to move to {i − 1, i + 1} (for simplicity of notation, assume that if i = N − 1,
then i + 1 = 0 and similarly that if i = 0 then i − 1 = N − 1 ) or stay still at i with
probability {p1 , p2 , p3 } respectively.
Figure 2: 1-D circle
Since we are only concerned about the meeting time, the relative position of the
two walkers is enough to describe that random variable. So we fix one walker at ’0’.
Then in this new equivalent model, the transition matrix of the other walker before the
encounter is M = P · PT .
A similar equivalent model can be defined for a N × N torus. Let V = {(x, y)|x, y =
1, 2, ..., l}. Every step, the walker on (x, y) moves to (x ± 1, y ± 1) or stay still at (x, y)
with given probability. Define the index of (x, y) to be Ind(x, y) = (x − 1)N + y, we can
get a N 2 -order matrix P. Let i, j denote the indices of two vertex (xi , yi ), (x j , y j ). Then
P(i, j) denotes the probability that one walker moves from (xi , yi ) to (x j , y j ) each step.
P is a “block-circulant matrix” defined in 3.1.2 .
Similar to the 1-D case, we fix one walker at the lower-right cell, the transient
matrix of the other walker before the encounter here is also given as M = P · PT , which
is symmetric.
Figure 3: 2-D Torus
Our main result is as follows: by suitably defining a Laplacian matrix L, the expected meeting time of the two walkers E[τ ] (i.e., the expectation of the first time that
they meet on the same cell starting from the steady state uniform distribution) on a ring
or torus could be explicitly expressed as the sum of the reciprocals of non-zero eigenvalues of L. We further conjecture based on empirical evidence that the result holds
2
more generally for simple random walks (i.e., with equal transition probabilities) on
arbitrary regular graphs.
2
Method and Key Results
2.1
Preliminary
Recall the standard definition of a Circulant Matrix:
Definition 1 (Circulant Matrix) A circulant matrix is a matrix where each row vector
is rotated one element to the right relative to the preceding row vector. A circulant
matrix A is fully specified by one vector, a, which appears as the first row of A.
2.1.1
Properties of Circulant Matrix
For arbitrary real, circulant matrix A generated by {a0 , a1 , · · · , an−1 }with order n, we
can find its eigenvalue in a general way following the approach indicated in [1]. First
define vector ξi whose jth component is
1
ξi ( j) = √ wi j
n
2π
where w = e n is the nth roots of unity
(1)
We can prove the following properties:
(a) < ξi , ξ j >= δi j
(b) Aξi = λi ξi , i = 1, 2, ..., n − 1, 0
(a) shows that{ξi |i = 1, 2, · · · , N} are the orthogonal eigenvectors of A. λi is the
eigenvalue of A, which can be calculated by
(Aξi )( j)
=
n
X
A( j, k)ξi (k)
(2)
k=1
a0
a1
an−1
√ wi j + √ wi( j+1) + · · · + √ wi( j+n−1)
n
n
n
n−1
X
= ξi ( j)( ak wik )
=
(3)
(4)
k=0
Let λi =
Pn−1
k=0
ak wik , then we have the property (b).
Definition 2 (Block-Circulant Matrix) If A is a n2 -order partitioned circulant matrix
generated by A0 , A1 , · · · , An−1 where the Ak are all n-order circulant matrices generated by {ai,0 , ai,1 , · · · , ai,n−1 } (see illustration below for a 9-order Block-Circulant Matrix). Then A is called a block-circulant matrix.




A0 A1 A2 
ai,0 ai,1 ai,2 




A = A2 A0 A1  , where Ai = ai,2 ai,0 ai,1  for i = 0, 1, 2.




A1 A2 A0
ai,1 ai,2 ai,0
3
2.1.2
Properties of Block-Circulant Matrix
Given index i, the coordinates of i is xi = quotient(i − 1, n), yi = remainderi − 1, nThen
we need to modify the definition of ξi by
2π
1 xi yi +x j y j
w
wherew = e n is the nth roots of unity
n
The properties given above in section 3.1.1 still hold, and we have
P Pn−1
xi l+yi (k+1)
λi = n−1
is the ith eigenvalue of A.
l=0
k=0 al,k w
ξ x ( j) =
2.2
Results on Circle
2.2.1
The Expected Meeting Time
(5)
Let us first discuss the problem on the simplest graph, a 1-D circle.
Theorem 1 If two particles make independent random walks on a circle with an uniP
th
form initial distribution, then the expected meeting time is λi ,0 λ−1
i , where λi is the i
T
eigenvalue of L = I − PP , and P is the transition matrix for a single walker.
Put the transition probabilities in M as the weight of edges. Then we get the Laplacian matrix,
L = I − M = I − PPT
(6)
which is a circulant matrix generated by {1−q0 , −q1 , −q2 , 0, · · · , 0, −q2 , −q1 }, where
q0 = p21 + p22 + p23 , q1 = p3 (p1 + p2 ), q2 = p1 p2 .
Let T i ,which is the ith component of vector T , denote the expected meeting time
with starting vertex i. Obviously, T 0 = 0. The initial distribution is π. Then E[τ] =
π T T . We can obtain a set of equations by recurrence:
T i = q2 T i−2 + q1 T i−1 + q0 T i + q1 T i+1 + q2 T i+2 + 1
i,0
(7)
Notice that the coefficients q0 + 2q1 + 2q2 = p21 + p22 + p23 + 2p3 (p1 + p2 ) + p1 p2 =
(p1 + p2 + p3 )2 = 1. By summing up the above equations, we have:
T 0 = q2 T N−2 + q1 T N−1 + q0 T 0 + q1 T 1 + q2 T 2 − (N − 1)
(8)
Thus, the Laplacian matrix L is the coefficient matrix of (4),(5).
LT = ∆t,
where ∆t = (1, 1, 1, · · · , 1, −(N − 1))T
(9)
Since L is a real, circulant matrix, we can use the conclusion in section 3.1.1.
Taking the inner product of (9) with ξi on both sides, from the symmetry of L we have
N−1
X
wik
< LT , ξi >=< T , Lξi >=< T , λi ξi >= λi ( T k √ + T 0 )
N
k=1
4
(10)
( √
N−1
1 X ik
− N i,0
< ∆t, ξi >= √ ( w − (N − 1)) =
0
i=0
N k=1
PN−1 ik
Notice that k=1
w = −1 for i , 0. Combined with (9), for i , 0,
N−1
X
√
Tk
√ wik = − N(λi )−1
N
k=1
(11)
(12)
Summing up by i, we have:
N−1 X
N−1
N−1
X
√ X
Tk
λ−1
√ wik = − N
i
N
i=1 k=1
i=1
N−1 X
N−1
X
i=1 k=1
T k wik = −N
Changing the order of summation,
N−1
X
Tk
N−1
X
i=1
k=1
wik = −N
(13)
N−1
X
λ−1
i
(14)
N−1
X
λ−1
i
(15)
i=1
i=1
N−1
N−1
X
1 X
λ−1
Tk =
i
N k=1
i=1
(16)
We assume the steady state distribution is the initial distribution. For any arbitrary
regular graph, this is the uniform distribution. The expected meeting time is then given
as:
E[τ] = π T T =
N−1
N−1
X
1 X
λ−1
Tk =
i
N k=1
i=1
(17)
Note that this is the sum of the reciprocals of non-zero eigenvalues of L.
2.2.2
The Order Estimation of E[τ]
For simplicity, we estimate the order of E[τ] for simple random walk (i.e., p1 = p2 =
p3 = 13 ):
E[τ] =
N
X
1
i,0
=
3
N
X
2
i=1
9
(2 −
4
πi 2
2πi −1
cos − cos
)
3
N 3
N
(2 − cos
N
=
πi
πi
− (cos )2 )−1
N
N
2X
1
9 i=1 (2 + ti )(1 − ti )
5
(18)
ti )−1 ∈ [1/3, 1], which is bounded by constants.
where ti = cos πi
N . Thus (2 +P
N
From [2], we have that summation i=1
(1 − ti )−1 is O(N 2 ). Thus E[τ] is O(N 2 ). On the
other side, for i = 1, applying the Taylor Theorem we have
1
1
=
1 − t1 1 − cos
π
N
=
1
= Θ(N 2 )
Θ(1/N 2 )
(19)
Thus E[τ] is also Ω(N 2 ), yielding that in fact for the 1-D circle, E[τ] grows with
the size of the graph as Θ(N 2 ).
2.3
Results on Torus
2.3.1
The Expected Meeting Time
Theorem 2 If two particles make independent random walks on a torus with an uniP
th
form initial distribution, then the expected meeting time is λi ,0 λ−1
i , where λi is the i
T
eigenvalue of L = I − PP , and P is the transition matrix for a single walker.
Similarly, put the probabilities of transition in M as the weight of edges. Then we
get the Laplacian matrix.
L = I − M = I − PPT
(20)
Let T i denotes the expected encounter time with starting point with index i, which
is the ith component of vector T . Obviously, T N 2 = 0. If the initial distribution is π,
then E[τ] = π T T . We can get a set of equations by recurrence (for a more readable
notation here we write that T x,y = T Ind (x, y)).
For ease of exposition, we illustrate below this recurrence equation for a simple
random walk, that means the walker in the original model moves to its neighbour or
stay still with the same probability 51 :
1
1
2
2
T x±2,y + T x,y±2 + T x±1,y + T x,y±1
25
25
25
25
2
1
+ T x±1,y±1 + T x,y + 1 i , 0
25
5
T x,y =
(21)
Note that such a recurrence equation for T x,y could also be written for any random
walk that moves to neighboring nodes with different probabilities.
We also have:
LT = ∆t,
where
∆t = (1, 1, 1, · · · , 1, −(N 2 − 1))T
(22)
With the same approach in 3.2, we have
< LT , ξi >=< T , Lξi >=< T , λi ξi >= λi (
N X
N
X
T k,l
k=1 l=1
6
N
w xi k+yi l )
(23)
< ∆t, ξi >=
(
X
1
−N
(
w xi k+yi l − (N 2 − 1)) =
0
N (k,l),(N,N)
i,0
i=0
(24)
Combined with (20) and T 0 = 0, summing up by i for i , 0, we have
2
NX
−1
i=1
2
NX
−1
T k,l xi k+yi l
w
= −N
λ−1
i
N
i=1
(k,l),(N,N)
X
X
X
T k,l w
xi k+yi l
(x,y),(N,N) (k,l),(N,N)
= −N
Change the sequence of summation, finally we have
1
N2
X
T k,l =
2
NX
−1
2
NX
−1
λ−1
i
(25)
(26)
i=1
λ−1
i
(27)
i=1
(k,l),(N,N)
Note that we get actually the same expression as 1-D circle. Given the uniform
initial distribution, the expected time E[τ] is the sum of the reciprocals of non-zero
eigenvalues of L.
2.3.2
The Order Estimation of E[τ]
Applied (6) to (25), we have
E[τ] =
N−1
X
i, j=0
(i, j),(0,0)
1
4πi
4π j
2πi
2π j
2πi
2π j
(20 − 2(cos
+ cos
) − 4(cos
+ cos
) − 8 cos
cos
)
25
N
N
N
N
N
N
which can be rewritten as
E[τ] ≡
A):
N−1
X
i, j=0
(i, j),(0,0)
1
1
2ti j si j + 3 1 − ti j si j
(28)
j)
π(i− j)
where ti j = cos π(i+
N , si j = cos N . By applying the lemma(proved in Appendix
Lemma 1 If θ1 , θ2 ∈ [0, π4 ], then
1
4
≤
1 − cos θ1 cos θ2 1 − cos 2θ1 cos 2θ2
we can separate the summation into Θ(logN) parts, and prove that each part is
Θ(N 2 ). Thus finally we obtain that
E[τ] is Θ(N 2 logN)
The complete proof is given in the Appendix A.
7
(29)
!−1
3
Discussion
We have proved that on the circle and the torus, the sum of the reciprocals of non-zero
eigenvalues of L = I − PPT is the expected meeting time of two walkers. In fact, if the
graph has a strong symmetry properties which guarantees M = PPT and L is (block)circulant, then the proof still holds. The simulation results shown in figure 4 match
the conclusion in section 3.
Figure 4: Simulation Results on 2-D Torus
Moreover, we find empirically that the expression even works for simple random
walks on arbitrary regular graphs. This is not a trivial observation, since the symmetry
of vertices doesn’t hold for arbitrary regular graph, see the examples for 4-regular
graphs in figure 5. In this case, the equivalent model approach of fixing one of the
walkers at a particular location and defining the transition matrix of the other walker
does not work.
Figure 5: Special Cases for 4-regular Graph
Conjecture 1 (Expected Meeting Time on Regular Graph) If two particles make independent simple random walks on a connected d-regular graph, and the initial distriP
th
bution is uniform, then the expected meeting time E[τ] is λi ,0 λ−1
i , where λi is the i
T
eigenvalue of L = I − PP , and P is the transition matrix for a single walker.
Our conjecture is supported by empirical evidence which we present here. Figure 6
shows simulation results as well as relevant numerical calculations for simple random
8
walks over arbitrary regular graphs. The left figure shows the results on 10-regular
graphs, while the right one on graphs with 30 vertices. For each horizontal point, a
single random graph is generated and fixed for averaging over multiple random initial
conditions drawn from a uniform distribution. Each blue mark indicates the average
meeting time when doing the experiment independently for 500 times, and green mark
for 10000 times. The red mark indicates the conjectured value of the expected meeting
time (i.e. the sum of the reciprocals of non-zero eigenvalues of L). The black mark
indicates the exact value of E[τ] which could be calculated by the definition of expectation once given transition probabilities (See Appendix B). In each case we see that
the conjecture is valid.
Figure 6: Simulation Results on General Regular Graphs
One way to prove the conjecture may be to use the method in section 3; but for this
approach we would need an additional conjecture.
Conjecture 2 If A is the adjacency matrix of a connected d-regular graph G with n
vertex, then A has a set of orthogonal eigenvectors {ξ1 , ξ2 , · · · , ξn } satisfying
(a). ξn = (1, 1, · · · , 1)T ;
(b). ξi (n) = 1, for all i;
Pn
(c).
ξ ( j) = 0,for all i;
Pnj=1 i
(d).
i=1 ξi ( j) = 0,for all j;
(e). < ξi , ξ j >= nδi j
Proposition 1 Conjecture 2 is a sufficient condition for Conjecture 1.
Proof. Suppose µ1 , ..., µn is the eigenvalues of A.
We define a matrix L˜ as follows:
L˜ = I − P ⊗ P
(30)
where P ⊗ P is the kronecker product of P. Then fromP = (I + A)/(d + 1), we have
the eigenvalue of P is βi = (µi + 1)/(d + 1). Thus from the properties of kronecker
product, the eigenvalue and eigenvector of L˜ is λi, j = βi β j and ξi, j = ξi ⊗ ξ j .
9
We can similarly construct a recursive function of T i, j , which indicates the expected
meeting time with walkers on vertex i and j. Obviously, T i,i = 0. We can prove that
˜ = ∆t, where ∆ti, j = 1 if i , j, else ∆ti,i = −(n − 1). Then
LT
˜ , ξi, j >=< T , Lξi, j >=< T , λi ξi, j >= λi, j
< LT
(n,n)
X
T k,l ξi (k)ξ j (l)
(31)
(k,l)=(1,1)
Combined with (c) and (e) in Conjecture 2, we have
< ∆t, ξi, j


n 
X
X


−(n − 1)ξi (k)ξ j (k) + ξi (k)
>=
ξ j (l)
k=1
=
n
X
k=1
l,k
−(n − 1)ξi (k)ξ j (k) − ξi (k)ξ j (k)
(32)
= −n < ξi , ξ j >= −n2 δi j
Thus we have
(n,n)
X
T k,l ξi (k)ξ j (l) =
(k,l)=(1,1)
1
< ∆t, ξi, j >= −n2 δi j
λi, j
(33)
Summing by (i, j) , (n, n) and applying (d), finally we get the expression
E[τ] =
1
n2
(n,n)
X
T i, j =
(i, j)=(1,1)
(n,n)
X
(i, j)=(1,1)
n
δi j
X 1
1
=
λi, j
λi,i
i
(34)
Notice that λi,i is the same eigenvalue of L = I − PPT in our original definition of
L. Thus we have proved that if Conjecture 2 holds then the Conjecture 1 would be
true.
Remark 1 If we let ξn be the eigenvector with eigenvalue µ = d, then (a) holds.
P
Remark 2 Since nj=1 ξi ( j) = (1, 1, · · · , 1)T ξi , multiply (1, 1, · · · , 1)T on the left of
Lξi = λi ξi and we have
n
X
j=1
ξi ( j) =
1 (1, 1, · · · , 1)T L ξi = 0
λi
(35)
Notice that the the row sum of L is equal to 0. Thus we have (c).
References
[1] Robert Kleinberg, Lecture notes for Computer Science 6822 Advanced Topics in Theory of Computing: Flows, Cuts, and Sparsifiers, Fall 2011, online at
www.cs.cornell.edu/courses/CS6822/2011fa/scribenotes/lec 2.pdf
10
[2] Elliott W. Montroll, “Random walks on lattices III: Calculation of first-passage
times with application to exciton trapping on photosynthetic units”, J-MATHPHYS, 10 (4), p.753-p.765, April 1969
Appendix A: The Proof for E[τ] = Θ(N 2 logN) on 2-D Torus
Recall Lemma 1:
If θ1 , θ2 ∈ [0, π4 ], then
1
4
≤
1 − cos θ1 cos θ2 1 − cos 2θ1 cos 2θ2
Proof.Let s = cos θ1 and t = cos θ2 , then cos 2θ1 = 2s2 − 1 and cos 2θ2 = 2t2 − 1.
The inequality in lemma is equivalent to:
4 − 4st ≥ 1 − (2s2 − 1)(2t2 − 1)
(36)
4ts − 4t2 s2 + 2t2 + 2s2 ≤ 4
Let f (t, s) = 4ts − 4t2 s2 + 2t2 + 2s2 , since if ts = c ≤ 1 is fixed, f attains its maximum at t = 1, s = c. Thus, it remains to show f (1, s) ≤ 4, which is −s2 + 2s − 1 ≤ 0,
this inequality is correct and we complete the proof.
Recall the equation (26) which can be obtained by some trigonometric identities.
E[τ] =
N−1
X
i, j=0
(i, j),(0,0)
1
1
2ti j si j + 3 1 − ti j si j
Since 1 ≤ 2ts + 3 ≤ 5 for all i, j, then
only need to estimate
N−1
X
i, j=0
(i, j),(0,0)
1
5
≤
1
2ts+3
≤ 1, which is bounded. Then we
π(i + j)
π(i − j)
1 − cos
cos
N
N
!−1
(37)
(i, j) are uniformly distributed within the grid [0, N] × [0, N] (except the origin),
then (i + j, i − j) are uniformly distributed in a diamond area in [0, 2N] × [−N, N], by
the symmetry of cosine function and omitting a constant coefficient, it’s equivalent to
estimate
N−1
X
p,q=0
(p,q),(0,0)
qπ −1
pπ
cos
1 − cos
2N
2N
11
(38)
Since when we set p = 0(or q = 0), the summation is
N−1
X
qπ −1
(1 − cos
)
2N
q=0
(39)
From [2], we have that summation is O(N 2 ).
Thus, it remains to prove the following summation is in Θ(N 2 logN)
N−1 X
p,q=1
1 − cos
pπ
qπ −1
cos
2N
2N
(40)
Now let us partition the region into Θ(logN) parts, denote by
Ak = Dk Ak−1 ,
whereDk = (p, q)|1 ≤ p, q ≤ 2k , k = 0, 1, 2, · · · , logN
(41)
for all k ≥ 1, |Ak + 1| = 4|Ak |, and every term (p, q) in Ak corresponds to (p, q), (p −
1, q), (p, q − 1), (p − 1, q − 1) in Ak + 1. Then applying the Lemma 1 and the cosine
function is non-negative and monotone decreasing in [0, π/2], we can prove that
Sk =
X
X qπ −1
pπ
cos
≤
1 − cos
2N
2N
(p,q)∈A
(p,q)∈A
k
k+1
pπ
qπ −1
1 − cos
cos
= S k+1
2N
2N
(42)
for k = 0, S 0 ≤ S 1 also holds by a simple calculation.
π
π
π −1
Notice that since 1−cos 2N
= Θ( N12 ), thus S 1 = (1−cos 2N
cos 2N
) is Θ(N 2 ). The
π −1
terms in S logN is bounded above by a constant (1 − cos 4 ) = 2 and similarly bounded
below by 0.5, then S logN is also Θ(N 2 ).
Thus, we have
logN
X
S k is Θ(N 2 logN)
(43)
E[τ] =
k=0
Appendix B: Calculating the Exact Value of E[τ ]
The exact value of expected meeting time could be calculated in the following way:
Suppose there are two walkers a and b. We denote the state that a is at vertex i while
b is at vertex j by S (i, j) , with index ((i − 1)N + j). Thus if the transition matrix for a
single walker is P, then the transition matrix for the states of two walkers is Q = P ⊗ P
except for the (i − 1)N + ith rows(the absorbing states), which are all zeros expect the
the iN + ith component. Let Λ = {(i − 1)N + i|i = 1, 2, · · · , N}, and S Λ is the set of
absorbing states. S (τ) indicates the state at time τ.
Recall the definition of expectation, we have
E[τ] =
∞
X
τ=0
τPr[S (τ) ∈ S Λ , S (τ − 1) < S Λ ]
that equals to
12
(44)
E[τ] =
∞
X
XX
τ
Pr[S (τ) = k, S (τ − 1) = l]
τ=0
k∈S Λ l<S Λ
∞
X
XX
=
τ
Pr[S (τ) = k|S (τ − 1) = l]Pr[S (τ − 1) = l]
τ=0
k∈S Λ l<S Λ
∞
XX
X
τ
=
Pr[S (τ) = k|S (τ − 1) = l]p0 · Qτ−1 · el
τ=0
(45)
k∈S Λ l<S Λ
∞
X
=p0
(τ · Qτ−1 ) · b
τ=0
P
where b is a column vector with n2 component, b( j) = i∈Λ Q(i, j) if j < Λ,
b( j) = 0 if j ∈ Λ. Then applying the series summation approach to matrix, finally we
have
E[τ] = p0 (I − B)−2˜
b.
(46)
where B is the sub-matrix of Q deliminating the rows and columns with index in
Λ, b˜ is the sub-vector of b deliminating the rows and columns with index in Λ.
13
Robust Influence Maximization
Zihan Tan
November 1, 2014
1
Introduction
Last decade has witnessed a tremendous increase in research of social network, since the Internet,
together with large amount of data on it became more accessible to people. Study on influence
propagation thus plays an important role in many applications including research on viral market,
epidemic spread and network advertising. Influence maximization served as a central problem in
influence propagation, and according to Chen et al [1], among previous mathematical models for
studying influence maximization, independent cascade model and linear threshold model turned
out to be most successful.
Approximation algorithm and hardness results towards influence maximization on linear cascade
model were proposed in previous literature. D.Kempe et al [2] proved that finding the optimal
seed set for influence spread given the metric of network is generally NP-hard, and proposed an
approximation algorithm with ratio 1 − 1e using techniques of submodular optimization. While the
optimality of the approximation ration is proved by U.Feige [3].
In independent cascade model, the probability on each edge is give as an accurate real number in
[0, 1]. However, in real setting this is not the case. It is necessary that we learn the probability from
previous record about certain nodes, and our knowledge on the probability is usually represented
by an interval pˆ − , pˆ + where pˆ is sample mean or some estimated value, and is confidence
interval. Thus, if we are in favor of a seed set such that this seed set gives good performance on
influence maximization under all possible parameters on the network, we are considering the robust
version of influence maximization.
A.Krause et al [4] proposed a saturation algorithm for solving general robust submodular optimization with a problem-dependent approximation ratio. However, for carrying it onto our problem,
certain difficulty must be overcome. To be specific, the robust optimization only consider finitely
many submodular functions which take only integer values, while robust influence maximization
problem concerns minimum over infinitely many submodular functions which take real values.
1.1
Notations and Definitions
We specify some notations about robust independent cascade model. Let N = (V, E) be a network
with V denoting the set of vertices and E denoting the set of edges. For every edge e, we have
a probability interval [le , re ] (0 ≤ le ≤ re ≤ 1) indicating the range of the latent probability pe
on this edge which is unknown to us, where the latent probability is the probability that this
edge is a Q
live edge in an outcome graph generated by independent cascade model. As a whole,
let Θ = e∈E [le , re ] be the parametric space of network N , and θ = (pe )e∈E be an instance of
1
parameter where pe ∈ [le , re ] for every edge e. Specifically, let θ− = (le )e∈E and θ+ = (re )e∈E be
the minimum and maximum parametric respectively.
For a parameter θ on the network, define function σθ : 2V → R+ to be the information spread
function on parameter θ, and for a subset of nodes S that we called source nodes, σθ (S) denotes
the expected number of nodes that is activated where the randomness is taken over the appearance
of every edge according to parameter θ.
1.2
Three versions of Influence Maximization
In this section we briefly summarize three versions of Influence Maximization problems. They are
Influence Maximization, Robust Influence Maximization and Stochastic Influence Maximization.
The second one is the robust version of the original Influence Maximization problem, with two
interesting perspectives.
Problem 1. (Influence Maximization)
Given graph G = (V, E), probability on each edge θ = {pe }e∈E and fixed budget k, we are
required to find a set of k vertices S ⊂ V, |S| = k such that the influence spread function σθ (S) is
maximized, where σθ (S) is the expectation of number of node reached, and the randomness is taken
according to the probability on every edge.
It was proved that general Influence Maximization problem is NP-hard [2], and since the objective function σθ (S) is submodular, we have a 1 − 1e approximation using standard greedy algorithm.
On the other hand, it is proved by Feige [3] that such constant could not be improved.
Problem 2. (Robust Influence Maximization)
Given graph G = (V, E) and the fixed budget k, with probability metric space Θ = ×e∈E [le , re ] be
the product of intervals, indicating the ranges of true probability on every edge, we are required to
find a set of k vertices S ⊂ V, |S| = k such that for all possible θ ∈ Θ, the influence spread function
σθ (S) is comparative to the optimal solution, i.e. our algorithm should output S such that
g(S) = min
θ∈Θ
is maximized, where
Sθ?
σθ (S)
σθ (Sθ? )
is the optimal solution when the probability on every edge is given by θ.
We states two concerns of Robust Influence Maximization problem, which are of independent
interest.
Perspective 1. Performance Directed
The performance directed perspective concerns how large could the optimal g(S) be, i.e. how
good could the performance of our output be. To some extent, it is the information-theoretic
perspective of the problem, i.e. it explores the following expression that we called “optimal performance”.
max g(S) = max min
|S|=k
|S|=k θ∈Θ
σθ (S)
σθ (Sθ? )
It concerns worst-case analysis, what it optimizes is the ratio of performance of output and the
optimal solution in the worst case. If this value is good, we can claim that S is a “universal” good
2
approximation solution to Influence Maximization problem. This is because all we know about the
probability on every edge is Θ, and the true θ could take arbitrary value in Θ, then if g(S) is good
(e.g. a constant), then we know that under all possible instance of θ, S can always give us an
approximation to the Influence Maximization problem. However, if the optimal g(S) is bad (e.g.
polynomially small in n) then we know that even if out algorithm find the best S, in the worst case
its performance σθ (S) is poor when compared to optimal solution σθ (Sθ? ).
Perspective 2. Non-Performance Directed
The performance directed perspective concerns how to find the optimal S, even if the performance of the optimal S is bad. The argument is that how large the optimal g(S) is is determined
by the graph structure and the input metric space, and is something like knowledge that we cannot
change by designing an algorithm, but what we can do is to design an algorithm to find the optimal
S or an approximation to it.
It can be observed that Non-Performance Directed perspective of Robust Influence Maximization is generally NP-hard, since when all le = re it degenerates to Influence Maximization problem,
which is NP-hard.
Problem 3. (Stochastic Influence Maximization)
Given graph G = (V, E) and the fixed budget k, with distribution Φ = {φe }e∈E of probability on
every edge, we are required to find a set of k vertices S ⊂ V, |S| = k such that h(S) = Eθ∼Φ [σθ (S)]
is maximized.
It can be observed that the objective of Problem 3 is an integral combination of submodular
functions, then it is still a submodular function. Thus it could be approximated via standard greedy
algorithm. Note that we would need fast evaluation of h(S) in the algorithm, and this could be
done by Monte Carlo method with guaranteed accuracy.
2
Results on Performance Perspective
In this section we introduce our results on performance directed perspective of Robust Influence
Maximization problem. With some analysis it can be seen that constraints on Θ is needed.
2.1
Sensitivity of Influence Spread Function
In this subsection we analyze the sensitivity of influence spread function σθ (S). We let δ be small
enough and explore how σθ (S) is influenced when giving a δ− perturbation to its parameter θ.
First we recall the following lemma proved in [1]:
Lemma 1. (Sensitivity of Influence Spread Function)
Given graph G and parametric space Θ, ∀S ⊆ V, ∀θ1 , θ2 ∈ Θ such that ||θ1 − θ2 ||∞ ≤ δ, then
|σθ1 (S) − σθ2 (S)| ≤ f (δ)
where f (δ) = |V | · |E| · δ, and ||θ1 − θ2 ||∞ ≤ δ means ∀e, |pθe1 − pθe2 | < δ.
We then propose the following example showing that this bound could not be improved when
considering only the number of edges and nodes, by not improvable we mean the order of n and m
is not improvable, while the constant could be better.
3
Consider the graph G = (V, E) where V = A∪B = {a1 , ..., an/2 , b1 , ..., bn/2 } and there is an edge
between ay pair of vertices. Θ is defined as the following: For all i, j ∈ [n/2], [l(ai ,bj ) , r(ai ,bj ) ] = [0, δ],
for all i 6= j ∈ [n/2], [l(ai ,aj ) , r(ai ,aj ) ] = [l(bi ,bj ) , r(bi ,bj ) ] = [1 − δ, 1]. We are required to choose k = 1
vertices as the only source node.
Assume without the loss of generality that an algorithm chooses S = {a1 } and we then calculate
the sensitivity of influence spread function. On one hand, since with high probability u1 will reach
the all vertices in A and no vertices in B, we have
σθ− (S) ≈
n
2
2
On the other hand, when giving high probability to every edge, if at least one of the n4 edges
between the two n2 -node cliques is live, then all vertices can be reached, otherwise only n2 nodes are
reached. Thus,
n2
n2
n
n n n2
σθ+ (S) = (1 − δ) 4 + n · [1 − (1 − δ) 4 ] ≈ + ·
·δ
2
2
2 4
σθ+ (S) − σθ− (S) ≈
n3
·δ
8
3
Note that this argument implies that we could not avoid O( nδ ) when a δ−perturbation is assigned to the parameter when there is no additional constraint on the parameter. And by the
following small modification we could make it show that the sensitivity bound mn˙ · δ is not improvable.
Let the graph be G = (V, E) where V = A ∪ B = {a1 , ..., an/2 , b1 , ..., bn/2 }, and E = {(ai , ai+1 ) |
i ∈ [ n2 −1]}∪{(bi , bi+1 ) | i ∈ [ n2 −1]}∪Eint . Eint could be arbitrary set of m−n+2 edges with one of
its endpoints in A and the other endpoint is in B. Θ is defined as the following: For all i, j ∈ [n/2],
[l(ai ,bj ) , r(ai ,bj ) ] = [0, δ], and for all i ∈ [ n2 − 1], [l(ai ,ai+1 ) , r(ai ,ai+1 ) ] = [l(bi ,bi+1 ) , r(bi ,bi+1 ) ] = [1 − δ, 1].
We are required to choose k = 1 vertices as the only source node. It is not hard to verify that as
long as m > 2n, this construction illustrates the tightness of Lemma 1.
2.2
Main Results with Constraints on δ
In this subsection we use the propositions proposed in previous one to give analysis to the performance of RIM under the different constraints of Θ.
Proposition 1. If there is no constraint on input parametric space Θ, then max|S|=k minθ∈Θ
is at most O( nk ).
σθ (S)
σθ (Sθ? )
Proof. Let G be an n−clique and for every e ∈ E, let le = 0 and re = 1, suppose S = {v1 , · · · , vk }
be output of the algorithm, let pe = 0 for all ES = {e = (u, v) | u ∈ S or v ∈ S} and let pe = 1 for
all e ∈
/ ES . Then σθ (S) = k and σθ (Sθ? ) = n − 1.
Since we can regard Θ as our knowledge about the transmitting probability on each edge, we
can assume that we do have some meaningful knowledge. In other words, we should have some
constraints on Θ so that the worst case performance is not as poor as O( nk ).
The most instructive constraint that we could think of is “uniform length constraint with
constant δ”: for every e ∈ E, re − le ≤ δ.
4
Recall we have the following two results about the performance when we have non-trivial
uniform-length constraints on Θ:
Proposition 2. When δ = O( n1 ), for any deterministic algorithm with optimal performance r,
r = O( logn n ), where n is the number of nodes in the network.
Proof. Consider graph G = (V, E) such that V = A ∪ B, |A| = |B| = n2 and E = {(u, v) | u, v ∈
A or u, v ∈ B}, and let E(A) be the set of edges with two endpoints in A and E(B) defined similarly.
The problem is to find a single vertex(k = 1) such that the influence spread is maximized. Let
p = n2 and the input instance is le = p − and re = p + for every edge e such that [le , re ] covers
the critical interval of Erdos-Renyi’s graph with n2 nodes.
Now since every node is seemingly the same for any algorithm, suppose the algorithm choose
a vertex u ∈ A, then consider the worst case θ where for every e ∈ E(A), pe = le and for every
e ∈ E(B), pe = re . It can be figured out that the optimal solution is an arbitrary point v ∈ B.
Since σ({u}) = O(log n) and σ({v}) = O(n), then the ratio r = O( logn n ).
If we allow the algorithm to be randomized, namely the output seed set S˜ is a random variable,
the definition of optimal performance would be:
"
#
˜
σθ (S)
r = max min ES˜
(1)
˜ S|≤k
˜
θ∈Θ
σθ (Sθ∗ )
S:|
Proposition 3. When δ = O( n1 ), for any randomized algorithm with optimal performance r,
√ n ), where n is the number of nodes in the network.
r = O( log
n
√
Proof. Consider graph G = (V, E) such that V = ∪1≤i≤√n Ai , |Ai | = n and E = {(u, v) | u, v ∈
Ai ∀i}, and let E(Ai ) be the set of edges with two endpoints in Ai . The problem is to find a single
vertex(k = 1) such that the influence spread is maximized. Let p = √1n and the input instance is
le = p− and re = p+ for every edge e such that [le , re ] covers the critical interval of Erdos-Renyi’s
√
graph with n nodes.
Now since every node appears to be the same for any algorithm, suppose the algorithm outputs
√
a distribution on [ n], i.e. p1 + p2 + · · · + p√n = 1. Without loss of generality let p1 be the
smallest one. Then consider the worst case θ where for every e ∈ E(A1 ), pe = re and for every
e ∈ E(Ai ), i ≥ 2, pe = le . It can be figured out that the optimal solution is an arbitrary point
v ∈ A1 .
Since
√
√
1
1
σ(AlgΘ ) ≤ √ O( n) + (1 − √ )O(log n) = O(log n)
n
n
and
then the performance
√
σ(OPT) = O( n)
log n
r = O( √ )
n
5
When adding tighter constraint on Θ, we could in expect better performance of informationtheoretic solution. To be specific, using Lemma 1, we could prove the following proposition:
Proposition 4. For any graph G, if for all e ∈ E, re − le ≤ δ, then ∀θ0 ∈ Θ, let
σθ (Sθ∗0 )
θ¯ = arg min
θ σθ (Sθ∗ )
Then,
σθ¯(Sθ∗0 )
σθ¯(Sθ∗¯ )
≥1−
2nmδ
σθ¯(Sθ∗¯ )
Proof.
If we make tighter constraint on Θ, we would know pe more accurately, since standard greedy
algorithm could obtain nearly constant approximation of Influence Maximization, it can also be
applied here to get a similar constant performance. Thus, we obtain the following informal result
table.
δ
Information Theoretic
1
O( n1 )
Polynomial Algorithm
O( n1 )
∆1 = O( n1 )
√ n)
O( logn n );O( log
n
√ n)
O( logn n );O( log
n
1
∆2 = O( mn
)
1−
1−
1
e
−
0
1
1−
1
e
One thing that we need to mention is that the value in the table is the upper bound or the
lower bound of the real value (That’s also why this table is informal). To be specific, we could
have the proposition of following type: “When our input parametric space Θ satisfies uniform
length constraint with constant ∆1 , then the optimal performance of any deterministic algorithm
could not be larger than O( logn n )” or “When our input parametric space Θ satisfies uniform length
constraint with constant ∆2 , then there exists an algorithm that can achieve the performance at
least 1 − 1e − ”.
2.3
2.3.1
Main Results with Constraint on Eigenvalue
The Largest Eigenvalue of an Undirected Graph
The results obtained from the research of the spread of SIS model arouse us to use the spectrum of
a graph to study the information spread in our model. We try to use the largest eigenvalue to help
us detect the ”critical threshold” of a graph. Following discussion is our trials with the import of
graph spectra.
We only consider the undirected weighted graph, where the weight of e is decided by the
appended probability θe . Let G be the undirected graph and θ be the influence. Let Pθ be the
adjacency matrix induced by (G, θ). It is obvious that Pθ is symmetric. Let λθ be the largest
eigenvalue of Pθ , then we can get that λθ > 0 by the linear algebra. Then we define the average
degree dθ (v) of a vertex v is
X
dθ (v) :=
θ(v,u)
u∈neighbor(v)
With these notations, we have the following result which may be helpful in our discussion.
6
Lemma 2.
P
dθ (v)
≤ λθ ≤ max dθ (v)
v∈V
n
v∈V
Proof. First, we need to notice that Pθ is symmetric, thus all eigenvalues λi of Pθ are real numbers.
Thus we can see
n
X
λi = tr[Pθ ] = 0 ⇒ λθ ≥ 0
i=1
First, we consider v = (1, 1, · · · , 1)H . Then we can find that
rP
P
2
dθ (v)
kPθ vk2
v∈V dθ (v)
=
≥ v∈V
λθ ≥
kvk2
n
n
Then, let u = (u1 , u2 , · · · , un )H be the eigenvector of λθ . Then we find ui such that |ui | ≥ |uj | for
all j. Then we see
Pn
Pn
j=1 θij uj j=1 θij |uj |
λθ ≤ ≤ dθ (i) ≤ max dθ (v)
≤
v∈V
ui
|ui |
Therefore, we find an easy way to estimate the largest eigenvalue. What’s more, this lemma
help us arouse some way to detect the ”critical threshold” of (G, Θ). θ − and θ + are defined as
before. For easy writing, let λ− = λθ− and λ+ = λθ+ .
Theorem 1. We only consider the condition with seed set of size 1. Let kθ + − θ − k∞ ≤ δ and c
be a constant belonging to (0, 1), then
1
)-approximation.
1. when λ− ≥ nc and δ has no constraint, we can only get O( n1−c
1
1
2. when λ− ≥ nc and δ = ω( n1+c
), we can only get O( n1−c
)-approximation.
3. when λ− ≥ nc , if we want to get (1 + ε)-approximation, then δ = o( n12 ) is necessary.
4. when λ+ ≤ α, we can get γ-approximation. (to be formed and proved)
The fourth claim is only a format, which is what we want to find and prove for the next step.
Here, we provide the proof of the first three claims.
Proof. For a fixed graph G and influence uncertainty Θ, we define
σθ (S)
θ∈Θ max|S ∗ |≤k σθ (S ∗ )
ρ(S, k) = min
1
• For claim 1 and claim 2, we first prove that we can get O( n1−c
)-approximation. We just
choose the vertex v with the maximal average degree under influence θ − as our seed set.
Then for any θ ∈ Θ,
σθ ({v}) ≥ dθ (v) ≥ dθ− (v) ≥ λ− ≥ nc
7
Thus we have
max ρ(S, 1) ≥ ρ({v}, 1) ≥
|S|=1
1
n1−c
1
Thus we can get O( n1−c
)-approximation. Now we show that there exists an pair (G, Θ) s.t.
this approximation is optimal. Consider a complete graph G, then we divide G into n1−c−ε
clique of size nc+ε . Consider an edge e = (u, v): if u, v belong to the same clique, θe ∈ [1−δ, 1];
if u, v belong to the different cliques, θe ∈ [0, δ]. Now we can find that λ− ≥ (1 − δ)nc+ε ≥ nc
for enough large n. Assume that we choose v as our seed, then we consider the following
θ: Let K be the clique containing v and e = (u, w); if u ∈ V (K) and v 6∈ V (K), then
θe = θe− = 0; otherwise, θe = θe+ . Then we can find that
σθ ({v}) = nc+ε
However, if we choose a node u 6∈ V (K). We only consider G \ K. If we treat every clique as
a node, then we can treat the probability of an edge as
p(Ki ,Kj ) = 1 − (1 − δ)n
2c+2ε
≈ (≤) n2c+2ε δ
However, we can choose ε small enough and n large enough s.t.
p := p(Ki ,Kj ) ≥ n2c+ε δ
1
Then we in fact form a random graph G(n1−c−ε , p). Then we can find that if δ = ω( n1+c
),
1
we will have p = ω( n1−c−ε ). Thus we can get
σθ ({u}) = O(n)
Therefore, we have
ρ({v}, 1) ≤ O(
1
n1−c−
)
1
)-approximation is
Notice that ε can be any small positive real number, we show that O( n1−c
optimal.
• Assume k = 1 and the algorithm selects a single point v in K1 , then we use the previous
construction θ, letting the probability on all edges related with K1 are le and all others edges
are assigned with re . It could be observed that
σθ ({v})
nc
nc
=
=
σθ ({S ∗ })
nc + 2nc (nc nc n1−c n1−c δ)
nc + 2nc · n2 δ
This is because that if there exists edge linking two different cliques, the optimal solution
would lead to 2nc nodes to be reached, and the probability that such an edge exists is
(nc nc n1−c n1−c δ) = n2 δ when δ is small enough. Thus, to get an 1 + approximation, it
is necessary that we let δ = o( n12 ).
8
3
3.1
Critical Detector
What is Critical Threshold?
We try to figure out what is ”critical threshold” of a graph. When we fix the size of influence space
Θ, we can easily find that the critical threshold is decided by the position of interval, or in other
words, the value of θ − . Therefore, we try to use the value of θ − to detect the critical threshold.
First, we need to have a look at the influence spread function σ(·)(omit θ). Notice that we can
treat the spread process as follows: first, we random pick up edges by the appended probability θ,
then we figure out the vertices connected with seed set. In this way, we can treat G as a random
variable, thus we use configuration C to represent the possible result of random variable G. Then
we can get
X
σ(S) =
Pr[G = C]|C(S)|
C
here we use C(S) to represent the set of vertices connected with seed set in graph C. Then we pick
up an edge e, we can rewrite the formula as follows:
X
X
σ(S) =
θe (Pr[G = C|e ∈ E(C)]|C(S)|) +
(1 − θe ) (Pr[G = C|e 6∈ E(C)]|C(S)|)
C:e6∈E(C)
C:e∈E(C)
Notice that if we fix θe0 for all e0 6= e and change the value of θe . We must get the maximal value
of σ(S) when θe = 0, 1. In fact, for σ(S), when θ = θ + we get the maximal value. Now, if we
consider σθ+ (S) − σθ− (S) with fixed δ but non-fixed value of θ − , we can also write it as
σθ+ (S) − σθ− (S)
X
=
θe− Pr[G = C|e ∈ E(C)] − Pr[G = C|e ∈ E(C)] |C(S)|
C:e∈E(C)
+
X
θ−
θ+
(1 − θe− ) Pr[G = C|e ∈ E(C)] − Pr[G = C|e ∈ E(C)] |C(S)|
C:e6∈E(C)
θ−
θ+
In the same way, we can find that σθ+ (S) − σθ− (S) get the maximal value when θe− = 0 or 1 − δ,
namely we get the maximal value at the end of the interval. But here we cannot easily find the
actual value as for σθ (S). Here we can find that if this maximal value is vary small, we can think
that there is no critical threshold in our influence space. If we call the condition that threshold
may happen as critical section, then critical section should be some kind of 0-1 assignment.
3.2
Critical Detector
Followed by the discussion of critical threshold, we want to find an easy way to detect whether the
threshold has non-zero probability to happen in our input (G, Θ). Therefore, we want to design
a algorithm to detect this condition, which we call as critical detector. Then we have following
conjecture, which will help us to develop a easy-use detector.
Conjecture 1. Given input (G, Θ), we define
ρ(S, k) = min max
θ∈Θ
|S ∗ |≤k
9
σθ (S)
σθ (S ∗ )
then for all integer k ∈ [1, n], we have
max ρ({v}, 1) ≤ max ρ(S, k)
v∈V
S:|S|≤k
If this conjecture is right, we just need to consider k = 1 to detect the critical threshold. Because
if maxv∈V ρ({v}, 1) is large enough, then we can assert that maxS:|S|≤k ρ(S, k) is large enough. Here,
we can see that even we just need to consider k = 1, maxv∈V ρ({v}, 1) is still difficult for us to
calculate. Thus we may use following property to estimate maxv∈V ρ({v}, 1).
Property 1. Given input (G, Θ),
max ρ({v}, 1) ≥
v∈V
maxu∈V σθ− ({u})
maxv∈V σθ+ ({v})
Proof. Let u∗ be the vertex that makes σθ− ({u}) largest and θ ∗ be the correspond influence of
ρ({u∗ }, 1). Then we have
max ρ({v}, 1) ≥ ρ({u∗ }, 1) =
v∈V
Notice that value of
algorithm.
4
σθ∗ ({u∗ })
σθ− ({u∗ })
maxu∈V σθ− ({u})
≥
=
∗
∗
max|S ∗ |≤1 σθ (S )
maxv∈V σθ+ ({v})
maxv∈V σθ+ ({v})
maxu∈V σθ− ({u})
maxv∈V σθ+ ({v})
can be calculated, thus we can put it in our detector
Approximation Algorithm on Non-Performance Perspective
In this section we apply the Saturation Algorithm provided in paper “Robust Submodular Observation Selection” [4] to our problem. Although there is some obstacle in directly applying it, e.g.,
Finitely many submodular functions are considered in [4] while infinitely many functions are considered here in RIM. We made effort to overcome it, and our main result is shown in the following
theorem.
Theorem 2. For the Robust Influence Maximization problem, let
S ∗ = arg max min
|S|≤k θ∈Θ
σθ (S)
σθ (Sθ∗ )
For any small constant > 0, there exists an algorithm that can output a seed set S g in time
O(mn2 log(m)) such that
(1)|S g | ≤ α · |S ∗ |
(2)
σθ (S)
σθ (A)
(2) min
≥ β · max min
∗
∗
θ∈Θ σθ (Sθ )
A,|A|≤k θ∈Θ σθ (Sθ )
where α = log minj,t Q mn
e=(j,t) (1−re )
+ 1, and β =
1−
1+ (1
− 1e ), m =
n3
k .
Particularly, if ∀e ∈ E, re ≤ 1 − n−p for some constant p, then α = O(log(n)).
The algorithm is just like the Submodular Saturation Algorithm, containing the following steps:
10
1. Let m =
n3
k .
Define parameter θ1 , ..., θm :
θ1 = θ−
θm = θ+
(3)
θm,e − θm−1,e = θm−1,e − θm−2,e = ... = θ2,e − θ1,e , ∀e ∈ E
2. For each parameter θi , use the basic greedy algorithm to obtain a seed set Sθgi .
3. Initial searching interval cmin = 0, cmax = 1.
4. Each iteration, pick the mid point c = (cmin + cmax )/2.
5. Obtain an approximate solution Sˆ using algorithm GPC with parameter c.
ˆ > αk, let cmax = c.
6. If |S|
ˆ ≤ αk, let cmin = c, S g = S.
ˆ
7. If |S|
8. Until |cmax − cmin | ≤
4.1
1
m.
The New GPC Algorithm
For a fixed c, let
Fθi ,c (S) = min{
σθi (S)
, c}
σθi (Sθgi )
m
1 X
Fc (S) =
Fθi ,c (S)
m
(4)
i=1
GPC Algorithm use a greedy strategy to find an approximate solution of the following Submodular Covering Problem:
min |A|
A
subject to F¯c (A) = F¯c (V )
Algorithm 1 GPC Algorithm for a fixed c
Input: Graph G, parameter value set Θ, {θi }m
i=1 , c.
S←∅
Q
while c − Fc (S) > 1/ minj e=(j,∗) (1 − re ) do
v0 ← arg maxv∈V \S Fc (S ∪ {v})
S ← S ∪ {v0 }
end while
output S
11
4.2
Performance of S g
Assume
S1 = arg min
|S|
s.t.
σθi (S)
≥ c, ∀i ∈ [m]
σθi (Sθgi )
S2 = arg min
|S|
s.t.
min
S
S
θ∈Θ
σθ (S)
≥c
σθ (Sθ∗ )
(5)
Lemma 2 and 4 guarantee the good performance of S g shown in Theorem 1. Actually the
analysis is just same as the one in ”Robust Submodular Selection”, so we will omit it and focus on
the two lemmas:
Lemma 2 shows the fact that the size of S g is not large.
Lemma 3.
|S g | ≤ α · |S2 |
where
α = log
minj
Q
mn
e=(j,∗) (1
− re )
(6)
+ const
(7)
Proof. According to the result in Wolsey’s paper,
|S g | ≤ β · |S1 |
(8)
where
β = 1 + log
maxj∈V Fc ({j})
the minimum nonzero increase of Fc during the GPC algorithm
Notice that for any set S such that minθ∈Θ
σθ (S)
σθ (Sθ∗ )
(9)
≥ c, it also satisfies for all θi ,
σθi (S)
σθi (S)
≥c
g ≥
σθi (Sθ∗i )
σθi (Sθi )
(10)
Thus, we have |S1 | ≤ |S2 |.
With the following lemma:
Lemma 4. With the definition shown above, β ≤ α.
we have
|S g | ≤ β|S1 | ≤ β|S2 | ≤ α|S2 |
(11)
which completes the proof. The proof of Lemma 3 is shown in the next subsection.
Lemma 4 shows the fact that the objective function of S g must be at least a constant rate of
the best objective value.
Lemma 5.
min
θ∈Θ
σθ (S)
σθ (A)
≥ β · max min
∗
σθ (Sθ∗ )
A,|A|≤k θ∈Θ σθ (Sθ )
(12)
1
1−
(1 − )
1+
e
(13)
where
β=
12
Proof. Consider the time when S g is assigned in Step 7 and the value hasn’t been changed after
that. Assume the feasible c this time is c0 .
According to the definition of sequence {θj }m
j=1 in Step 2, for all θ ∈ Θ, there exists i ∈ [m]
such that
1
||θi − θ||∞ ≤
(14)
2m
Then with Lemma 1, it implies
1
|E||V | 2m
|σθ (S) − σθi (S)|
≤
≤
σθ (S)
k
(15)
(1 + )σθ (S) ≥ σθi (S g ) ≥ c · σθi (Sθgi )
1
≥ (1 − )c · σθi (Sθ∗i )
e
1
≥ (1 − )c · σθi (Sθ∗ )
e
1
≥ (1 − )(1 − )c · σθ (Sθ∗ )
e
(16)
σθ (S)
1−
1
1−
1
σθ (S ∗ )
≥
(1
−
)
·
c
≥
(1
−
)
·
min
θ σθ (Sθ∗ )
σθ (Sθ∗ )
1+
e
1+
e
(17)
Thus for all θ ∈ Θ, we have
Then,
min
θ∈Θ
4.3
Upper Bound of α
To control the size of the seed set we find, we need β to be relatively small compared to n.
Proof. (proof of Lemma 3)
maxj∈V Fc ({j})
the minimum nonzero increase of Fc during the GPC algorithm
maxj∈V Fc ({j})
≤ 1 + log
minj∈V,S⊆V,Fc (S∩{j})6=Fc (S) Fc (S ∩ {j}) − Fc (S)
β = 1 + log
Firstly, for all j ∈ V ,
(18)
m
Fc {j} =
1 X
Fi,c ({j})
m
i=1
m
1 X σθi ({j})
≤
m
σθi (Sθgi )
i=1
1
≤
1−
1
≤
1−
m
1
e
1 X σθi ({j})
m
σθi (Sθi∗ )
1
e
13
i=1
(19)
Secondly, we consider the value minj∈V,S⊆V,Fc (S∩{j})6=Fc (S) Fc (S ∩ {j}) − Fc (S).
Actually we have
m
1 X σθi (S ∩ {j}) − σθi (S)
σθ (S ∩ {j}) − σθi (S)
1
min i
≥
g
m
m j,S
σθi (Sθi )
σθi (Sθgi )
i=1
≥
≥
1
min(σθi (S ∩ {j}) − σθi (S))
mn j,S
Q
minj e=(j,∗) (1 − re )
According to the Algorithm 1, c − Fc (S) ≥ 1/ minj
mn
Fc (S ∩ {j}) − Fc (S) ≥
(1 − 1e ) minj
mn
Q
e=(j,∗) (1 − re )
− re ). So
Q
minj e=(j,∗) (1 − re )
e=(j,∗) (1
min
j∈V,S⊆V,Fc (S∩{j})6=Fc (S)
β ≤ 1 + log
Q
(20)
= 1 + log
mn
minj
Q
mn
e=(j,∗) (1
− re )
(21)
(22)
The algorithm provide a way to obtain a different kind of ”approximation” toward the Influence
Difference Maximization problem. It enlarge the constraint size while maintain the performance.
The algorithm could deal with all of the graph and most of the circumstance unless re = 1 for some
e ∈ E. However, we cannot use a random algorithm getting a seed set S1g that satisfies:
|S1g | ≤ |S ∗ |
σθ (S)
σθ (A)
min
≥ γ · max min
∗
∗
θ∈Θ σθ (Sθ )
A,|A|≤k θ∈Θ σθ (Sθ )
(23)
The reason is the objective function is not submodular. This problem restricts the use of the
algorithm. Tian Lin suggests that deleting node from S g might be helpful. But the solution hasn’t
been found.
References
[1] C.Wei et al, Information and Influence Propagation in Social Networks, Morgan and Claypool,
2013.
[2] D.Kempe et al, Maximizing the Spread of Influence through a Social Network, Proc. 9th ACM
SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
[3] U.Feige, A Threshold of ln n for Approximating Set Cover, J. ACM, 45(4), 1998.
[4] A.Krause et al, Robust Submodular Observation Selection, Journal of Machine Learning Research 9 (2008) 2761-2801.
14
CPEMAB Research Report 1014
Zihan Tan
November 1, 2014
Problem 1. Given two biased coins indexed by 1 and 2, with unknown latent probabilities p1 and
p2 (these two data called latent structure, and they could be equal), we are required to judge which
coin has larger latent probability by making samples.
An algorithm is said to be (, δ) correct if for any latent structure, with probability ≥ 1 − δ, it
gives the correct output (if p1 = p2 then both 1 and 2 are correct outputs).
Proposition 1. There does not exist an algorithm satisfying the following properties:
(1)There exists a horizon time T such that the algorithm will halt after T steps
(2)It is (0, δ) correct.
Proof. We prove by contradiction.
In Sample Complexity it is proved that (Theorem 1)
Lemma 1. There exist positive constants c1 , c2 , 0 , and δ0 , such that for every n ≥ 2, ∈ (0, 0 ),
and δ ∈ (0, δ0 ) correct, and for every (, δ)-correct policy, there exists some p ∈ [0, 1]2 such that:
n
c2
log
2
δ
Suppose such an algorithm A exists, we choose > 0 sufficiently small such that c1 n2 log cδ2 > T .
Since A’s output is (0, δ), then it is also (, δ) correct. However, it is proved that the expected
running time is larger than T , which is a contradiction.
Ep [T ] ≥ c1
Before proving the next proposition, some notations are in order to make the statements clear.
Definition 1. (realization; finite realization)
A pair of finite 0 − 1 sequences τ = (a1 a2 · · · an · · · , b1 b2 · · · bn · · · ) is called a realization of two
coins, i.e. when algorithm makes the ith sample of coin 1, its outcome is ai , and when the algorithm
makes the j th sample of coin 2, its outcome is bj .
A pair of finite 0 − 1 sequences φ = (a1 a2 · · · an , b1 b2 · · · bn ) is called a finite realization of two
coins, i.e. when algorithm makes the ith sample of coin 1, its outcome is ai , and when the algorithm
makes the j th sample of coin 2, its outcome is bj . Let n = |φ| denotes its size.
Since the output of the algorithm only relies on the outcome of samples, then for any realization,
the algorithm either never halts on it, or halts on some finite realization (which is a finite prefix of
the realization). For a latent structure (p1 , p2 ) define the a σ-algebra and a measure Pp1 ,p2 (·) on it
as the following:
1
Definition 2. Let Ω be the set of all realizations. Then for a finite realization φ, let Sφ = {τ |
φ is the prefix of τ }, let A = {Sφ | φ is a finite realization} ∪ {∅} ∪ {Ω}, then A is a σ-algebra on
Ω.
Let Pp1 ,p2 : A → [0, 1] such that Pp1 ,p2 (∅) = 0, Pp1 ,p2 (Ω) = 1 and
Pp1 ,p2 (φ) =
Y
1≤i≤n
(1 − p1 + ai (2p1 − 1)) (1 − p2 + bi (2p2 − 1))
It is not hard to check that Pp1 ,p2 is a probability measure on (Ω, A).
Definition 3. (Halt with probability 1)
Let A be an algorithm, IA (τ ) be the indicator of halting, i.e. IA (τ ) = 1 if and only if A halts
on realization τ , which means A halts after observing some finite prefix φ of τ .
We say an algorithm halts with probability 1 if for all (p1 , p2 ), EPp1 ,p2 [IA ] = 1
Proposition 2. For any δ < 21 , there does not exist an algorithm satisfying the following properties:
(1)With probability 1 it will halt.
(2)It is (0, δ) correct.
Proof. We prove by contradiction, assume such an algorithm exists. First without loss of generality
we assume that the algorithm always make equal number of samples of both coins when halting (if
an algorithm halts after making r samples of coin 1 and t samples of coin 2 (r > t), we could let it
make r samples of both coins and only use t first records of coin 2).
Now we suppose when algorithm gets a particular finite realization, it either stops or continues
to make another sample (some algorithm randomly choose to stop or continue, and we now exclude
them in the following proof, although with small modification towards the proof they could be
however included).
Definition 4. (terminating finite realization)
Let φ be a finite realization, then it is a terminating finite realization for algorithm A if the
algorithm halts when observing some prefix of φ as the outcome of samples.
According to the definition, for all (p1 , p2 ), let
X
Pn (A) =
Pp1 ,p2 (Sφ )
|φ|=n;φ is terminating for A
Then we have:
lim Pn (A) = 1
n→∞
Choose > 0 such that < 21 − δ. Let p1 = p2 = 12 , then there exists N ( 12 ) ∈ N such that for
all n ≥ N ( 21 ), Pn (A) ≥ 1 − 12 .
Now choose small enough 1 > 0 such that
X
|φ|=N ( 21 )
1
|P 1 , 1 (Sφ ) − P 1 +1 , 1 −1 (Sφ )| ≤ 2 2
2
2
2
and
2
X
|φ|=N ( 21 )
1
|P 1 , 1 (Sφ ) − P 1 +1 , 1 −1 (Sφ )| ≤ 2 2
2
2
2
P
This could be done since when g(x) = |φ|=N ( 1 ) |P 1 , 1 (Sφ ) − P 1 −x, 1 +x (Sφ )| is continuous and
2
2 2
2
2
g(0) = 0.
Then, we set two latent structures. In the first structure p1 = 12 + 1 , p2 = 12 − 1 , while in the
second structure p1 = 21 − 1 , p2 = 12 + 1 . It is immediate that outputting 1 in the first structure
is correct and outputting 2 in the first structure is correct.
Then consider the output of algorithms on these terminating finite realizations. Since for both
latent structures, the probability that algorithm stops in N ( 12 ) steps is larger than 1 − , let s1 be
the probability of outputting 1 in the first latent structure, s2 be the probability of outputting 2
in the first latent structure; let t1 be the probability of outputting 1 in the second latent structure,
t2 be the probability of outputting 2 in the second latent structure. Thus, since the output of
algorithm relies only on the realization that it observes,
|s1 − t1 | + |s2 − t2 | ≤ Therefore, |s1 − t1 | ≤ , |s2 − t2 | ≤ . And since s1 + s2 ≤ 1, and the algorithm is correct with
probability at least 1 − δ, we obtain that s1 ≥ 1 − δ, then s2 ≤ δ and t2 ≤ δ + and therefore
t1 ≥ 1 − 2 − δ, and the probability that algorithm output correctly on the second latent structure
is at most 2 + δ < 1 − δ, causing a contradiction. This finishes the proof.
3
On the Generalized Pagerank Model
Zihan Tan; Yang Song; Yuchen Yang
October 31, 2014
Abstract
Although various centrality is investigated through different networks, special and mature
centrality for analyzing influence of theoretical researcher has not appeared. Among previous
method Bonacich’s model is the most outstanding one. However, in his main equation the
relationship is trivially adjacency matrix. Our main idea is that one should build particular
weighted relationship based on reasonable hypothesis which characterizes the real problem. In
this paper we first build up the Erdos Network and analyze its property. We then propose a
new model called “Pairwise Evaluation Model” to measure the influence, construct the nontrivial
weighted relationship matrix by taking both real influence (co-authorship) and virtual influence
(fame) into account. This new model will help us to obtain an objective relationship matrix
when lacking in certain data. Following this idea, we then deal with two additional problems: We
analyze the influence of movie actors in film-network and the influence of a paper in the citation
network. Small variation is made in “Pairwise Evaluation Model” according to reasonable
hypothesis, which gives us better results in two additional problems.
1
On the Generalized Pagerank Model
Page 2 of 21
Contents
1 Introduction and Background
1.1 Previous Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
4
2 Erdos Network and Property Analysis
5
3 Pairwise Evaluation Model and Pagerank Model
3.1 Explicit Expression of Models . . . . . . . . . . . . . . . . . . . . . . .
3.2 Interpretation of Models . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Essence of Pagerank Model . . . . . . . . . . . . . . . . . . . .
3.2.2 Essence of Our Pairwise Evaluation Model . . . . . . . . . . . .
3.2.3 Comparison and Our Improvement . . . . . . . . . . . . . . . .
3.3 Mathematical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Convergence of Series and Main Equation . . . . . . . . . . . .
3.3.2 Numerical Method for Obtaining Approximation Solution for V
3.3.3 Interpretation of Constraints . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
8
9
10
10
10
10
4 Results
4.1 Influence Analysis in Erdos Network . . .
4.1.1 Algorithm and Results . . . . . . .
4.1.2 Remarks and Comments . . . . . .
4.2 Influence Analysis in Film Actor Network
4.2.1 Algorithm and Arguments . . . . .
4.2.2 Results and Comments . . . . . .
4.3 Influence Analysis of Fundamental Papers
4.3.1 Algorithm and Arguments . . . . .
4.3.2 Results . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
11
12
12
13
14
14
15
5 Essence and Understanding of Modeling Influence
5.1 Science and understanding of modeling influence within a network
5.2 Analysis on Individual Strategy . . . . . . . . . . . . . . . . . . . .
5.2.1 Experiment and Results . . . . . . . . . . . . . . . . . . . .
5.2.2 Analysis and Comments . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
17
17
18
6 Sensitivity Analysis, Strength and Weakness
6.1 Sensitivity Analysis . . . . . . . . . . . . . . .
6.1.1 Experiment and Results . . . . . . . .
6.1.2 Analysis and Comments . . . . . . . .
6.2 Strength and Weakness . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
18
20
20
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
On the Generalized Pagerank Model
1
Page 3 of 21
Introduction and Background
In academic field, people tend to use various tools like SCI, H-factor, Impact factor or Google
Scholar to evaluate peers. These are all network-based analysis tools. Long before the emergence of
network science, there is a certain measurement–Erdos number–in mathematics community. Paul
Erdos published over 1400 papers and co-authored with more than 500 researchers. Because of his
significant influence in mathematic community, Erdos number is created to measure the proximity
to him in bibliographical terms. In this paper we investigate the influence of researchers in network
Erdos1, which contains all professors that have directly co-operated with Erdos. We develop a
new model called “Pairwise Evaluation Model” to measure the influence, taking both real influence
(co-authorship) and virtual influence (fame) into account, which will be discussed in section 3.
The main idea is that, among researchers, different person has different point of view of a specific
researcher, which is different from ideas in previous paper (uniform influence coefficient). We
then analyze the performance of the model in movie actor network. For analyzing the influence
of a paper or a journal, small variation is made in “Pairwise Evaluation Model” (combined with
pagerank model) to obtain better performance.
1.1
Previous Results
Intuitively, the node with highest degree should be most important in the graph. But there are
cases where a node has many edges but isolated from the major population. Centrality measurements are classic for evaluating importance. Previous work has done a lot in this region.
Bonacich [1] proposed a family of centrality measures in 1987. Apart from centrality measurements such as betweeness and closeness, another parameter beta is included to take into account
the global property of the network. For a given node, if β is large, the centrality of other nodes
connected to it takes more weight in the evaluation of this node’s importance. We develop our
algorithm based on the same idea. However, this measurement can only take into account networkbased internal information, making assumptions such as all nodes in the graph is equal.Under the
framework of the model, this measurement c(α, β) could be interpreted as the number of paths
activated by sending a signal of a particular node. By comparing ci (α, β) we could evaluate the
importance of node i. We will give further explanation and exploration in the model section.
Newman investigate structure and properties of scientific collaboration network. In his papers
[2][3], some local and global statistics and their differences in different bibliographic databases are
investigated. Table 1 list all statistics include in his work.
Watts and Strogatz first proposed the idea of small world model in 1998. Before that, a
network is considered either completely random or totally regular. But it is not the case in many
real networks whose topology lies in between. This phenomenon is measured by comparably large
cluster coefficient and unusual small characteristic path length, compared with a regular graph.
The indication is that akin to regular graph, small-world network is highly clustered, but the
shortest distance between any pair of nodes is much smaller than regular lattice. This is caused by
the existence of shortcuts that connect two far-away vertices together. Many social networks like
3
On the Generalized Pagerank Model
Table 1: statistics for the scientific collaboration networks
Statistics
Collaborators per author
Explanation
significant people is
more likely to have
more collaborators
results
power law
80-90% of the total nodes
Size of giant component
Clustering coefficient
Betweenness
Average distances
Page 4 of 21
in the network
local communities,
such as belonging to
the same affliction
in scientific community
a measurement of significance
average of the
shortest path between
each pair of nodes
information can circulate fast
strong clustering effect
funneling, very
clear winner
small world effect
co-author network, co-stardom network all have this small world phenomenon. (we calculate L and
C for our graphs to illustrate the small world property of our network).
1.2
Our Results
We constructed the Erdos1 network and analyzed its properties from classical perspective,
namely computing all kinds of centrality and give certain distributions. For analyzing influence
of individuals in this co-authorship network, we raised a new model called “Pairwise Evaluation”,
which is an improved version of Pagerank model. New model takes both “real influence” and
“virtual influence” into consideration, which is different from the method used in Pagerank model
(Pagerank model simply uses adjacency matrix as the relationship matrix). This improvement is
indeed necessary in researcher network because individual emotion matters. Let R denote the real
influence matrix and V denote the virtual influence matrix. By defining certain random process to
simutate the spread of information, the main equation of our model is the following:
α2 V 2 + (α1 R − I)V + I = 0
The main idea for our improvement is that: Although Pagerank model gives us an outstanding
way of measuring influence, we still need to find out the main feature in a particular network and
characterize it mathematically when investigating this network. In short, we should go down into
specific layer from general layer when faced with a particular problem.
Following this main idea, we then investigate the influence of paper-citation network and influence in film-actor network. We invented another two improved versions of Pagerank model, with
respect to the observation that, generally speaking, the influence of a movie or a paper is decreasing
along the timeline, i.e. the earlier it has been published, the less influential it is now.
Finally we give our own understanding of modeling influence within a network.
4
On the Generalized Pagerank Model
Page 5 of 21
Based on what we have done, we think that the main idea of modeling influence is twofold.
Make clear which particular property of influence you really care about and simulate or assign it
mathematically on the graph (using random process) to obtain a measurement.
The argument above is rather intuitive, but it can be practical and essential, as we suggested,
in dealing with certain problems. Besides, random process and maybe some relative concentration
inequalities will be a powerful tool in this field.
At the end of this paper, sensitivity analysis are done and strengths and weaknesses are pointed
out. We also provide insights for future work.
2
Erdos Network and Property Analysis
Figure 1: Erdos1 network
Figure 1 is part of Erdos network. We delete all researchers with degree less than 5 to obtain
this simplified version. We then compute classical centrality and the degree distribution of this
network, see Table 2, Table 3 and Table 4.
Generally speaking, the graph is not dense. 511 researchers have only 1639 times of collaboration, with the density to be 0.013 and the average degree to be 6.415. One important measurement
called number of weakly connected components can also illustrate its sparseness. In this graph,
there are 42 weakly connected components. Another illustration is given by the clustering coefficient
whose value is 0.343.
5
On the Generalized Pagerank Model
Page 6 of 21
Rank
1
2
3
4
(a) betweenness distribution
Name
HARARY,
FRANK
SOS,
VERATURAN
RUBEL,
LEEALBERT
STRAUS,
ERNSTGABOR
(b) Top rank in betweenness
Table 2: Result in Betweenness
1
2
3
4
(a) Closenness distribution
HENRIKSEN,
MELVIN
GILLMAN,
LEONARD
BOES,
DUANECHARLES
GAAL,STEVENA.
(GAL,ISTVANSANDOR)
(b) Top rank in closeness
Table 3: Result in closeness
1
2
3
4
(a) Closenness distribution
ALON,
NOGAM
HARARY,
FRANK
GRAHAM,
RONALDLEWIS
BOLLOBAS,
BELA
(b) Top rank in degree
Table 4: Result in degree
However, edges are not uniformly distributed. There are 1639 edges in the network. Less than
25 percent of them (389) are not related with top 100 researchers, and less than half of them (791)
6
On the Generalized Pagerank Model
Page 7 of 21
not related with top 50 researchers. (Here top researchers represents the nodes with highest degree.)
We can conclude that there is a central group of researchers, they dominate most collaboration.
3
Pairwise Evaluation Model and Pagerank Model
In the following discussion we denote the adjacency matrix of co-authorship network by A, i.e.
Aij = Aji = 1 if researcher i and j have once coauthored on some paper, and Aij = Aji = 1 if not.
Let di be the degree of researcher i, namely the degree of vertex i in the graph.
3.1
Explicit Expression of Models
Phillip Bonacich [1] investigated a fundamental family of influence measures.
c(α, β) = α(I − βA)−1 A1
In this formula c(α, β) is the influence vector (c(α, β) ∈ Rn ) with the ith coordinate being a real
number in [0, 1], representing the degree of influence of researcher i, and α and β are the parameters
in his model, and they are flexible to different problems. 1 represents the n−dimensional vector
with all entries to be 1.
Phillip Bonacich’s model was later generated into Pagerank algorithm, which is a fairly influential and well-performed model in research and application of search engine. The basic idea is mainly
the following: In a graph, a vertex is important and influential if the following two conditions are
satisfied.
• Its degree is large.
• Most neighboring nodes are important and influential.
We propose a model taking both real influence and virtual influence into account. Our main
idea is that: If researcher i has collaborated with j, then they have “real influence” on each other.
Besides, every researcher also has “virtual influence” on another researcher. This is the case when
i has not co-operated with j but has heard of j or has read j’s paper. Although they have not
co-authored on some paper, they know each other to some degree.
From our perspective, virtual influence between every pair of researchers should be not the
same. The virtual influence of s on v is denoted by a real number vst ∈ [0, 1] in the following
discussion.
Thus, we separate the influence in the network into two parts: real influence and virtual influence. To be mathematically specific, we denote them by two matrices: R and V . Given a network,
these two matrices are determined in the following way:
R is simply a normalized version of adjacency matrix A, namely
−1
−1
R = diag{d−1
1 , d2 , · · · dn }A
V is then determined by the following formula:
7
On the Generalized Pagerank Model
Page 8 of 21
α2 V 2 + (α1 R − I)V + I = 0
where α1 and α2 are flexible parameters of the model, which should satisfy α1 +α2 ≤ 1. Besides,
these two parameters should not be large so that the following normalization property of V could
be satisfied.
• Restricted Anonymity
For all i,
n
X
Vij ≤ 1
n
X
Vij ≤ 1
j=1
• Restricted Input
For all j,
i=1
3.2
3.2.1
Interpretation of Models
Essence of Pagerank Model
The essence and intuition of Bonacich’s model (we call it “pagerank model” in the following) is
the following.
Influence is the ability to spread a message. Bonacich defined a random process to simulate the
process of spreading a message from each node in the network. Every researcher sends a message to
all his neighbors, then with probability β that a communication, once sent, will be transmitted by
any receiving individual to any of his contacts. Thus, the expected total number of communication
caused by each individual is proportional to the ability of spreading a message from individual i,
which can be written as the following formula.
c(α, β) = α
∞
X
k=1
3.2.2
β k−1 Rk 1 = α(R1 + βR2 1 + β 2 R3 1 + · · · )
Essence of Our Pairwise Evaluation Model
Following the main idea and manner of pagerank model, we also define a random process to
simulate the process of spreading a message. Differently, we do not measure the expected number
of total paths. We measure the expected total amount of information, which will be defined which
will be defined as: A researcher receives a same message from k of his neighbors, then the amount
of information he gets is k.
The random process proceeds in the following manner.
First, researcher s receive a message and he is going to spread this message according to following
rule. What we care about is the expected amount of information that t received in the process.
8
On the Generalized Pagerank Model
Page 9 of 21
When any person i received this message, he will randomly send this message to others. To be
specifically, he will do two following “random-spread” actions independently with certain probability:
• Local Spread (with probability α1 )
He sends the message independently to all researchers (including himself) according to virtual
influence coefficients, i.e. researcher j will get the message from person i with probability vij
independently.
• Global Spread (with probability α2 )
He sends the message uniformly random and independent to all his neighbors (not including
himself), i.e. with probability Rij (also independently) he will pass this message to his neighbor j.
This step is well-defined due to normalization property of R.
Following the random process, for the pair (s, t), we compute the expected total amount of
information received by t sent from s which should also by definition the ability of spreading a
message from individual s to individual t, namely vst .
The above argument holds for every pair s, t. We can therefore write down the following formula.
V =
∞
X
(α1 R + α2 V )k
k=0
By proving the convergence, we obtain the following main equation:
α2 V 2 + (α1 R − I)V + I = 0
Instead of solving it analytically, we use numerical method to obtain an approximation solution
of V (which will be in next subsection, and it is also guaranteed that all entries of V is non-negative).
However, we are required to output the degree of influence of every researcher in the network.
Let
M = βR + (1 − β)V
be the relation influence matrix of the co-authorship network. β is a flexible parameter for the
model.
We the output a influence vector u from influence matrix M . Using method of eigenvector
centrality, we can get the unique non-negative eigenvector, which is guaranteed by Perron Frobenius
Theorem, to be the influence vector:
M u = λu
namely, ui represents the influence coefficient.
3.2.3
Comparison and Our Improvement
Our model differs mainly in two features from pagerank model.
First, the uniformity of the probability of transmitting a message is omitted. In pagerank model,
with same probability β every researcher will transmit the message to any neighbor of his. However,
this may not be the case in real co-author network. It would be more likely that a researcher may
send a message to someone that has more influence on him (for example, the one he is admired
9
On the Generalized Pagerank Model
Page 10 of 21
of) than to a person he was not familiar with, even if he has ever co-operated with. The degree of
closeness has not be a fundamental factor in pagerank model. But here in our model it is denoted
by Virtual Influence Matrix V .
Second, pagerank model only allows individuals to send the message to his neighbors, which
may also not be the case in real network. Consider that one researcher has come up with an
interesting problem, he may directly communicate with experts in that field but not first tell his
friends or the individuals that he has co-operated with. In our model, the GlobalSpread phase
allows this to happen.
3.3
3.3.1
Mathematical Remarks
Convergence of Series and Main Equation
We are left with the following matrix equation:
V =
∞
X
(α1 R + α2 V )k
k=0
First the convergence of right hand side is guaranteed by simple mathematical argument. For
the limited number of pages we omit it.
Thus, we obtain our main equation:
α2 V 2 + (α1 R − I)V + I = 0
3.3.2
Numerical Method for Obtaining Approximation Solution for V
We use approaching method to get an approximation solution to this equation instead of solving
it analytically:
• Let f (V ) = α2 V 2 + α1 RV + I, the equation asks for a fixed point for matrix function f .
• f is a contractive mapping for specific set of V . Recurrence method is therefore employed
here to find a solution.
• Let V0 be the uniform distribution matrix, which means every entry of V0 is n1 , and Vk+1 =
f (Vk ) for k.
3.3.3
Interpretation of Constraints
Several constraints are assigned in our model, we give explanations to them here:
• Restricted Anonymity and Restricted Input is required to make its operator norm bounded,
and therefore the series is convergent.
• α1 + α2 ≤ 1 is necessary of the convergence of series. This constraint is also the case when
every researcher is required to randomly choose one of two spread action.
10
On the Generalized Pagerank Model
Rank
1
2
3
4
5
6
7
8
9
10
Name
RODL,VOJTECH
GRAHAM,RONALDLEWIS
FUREDI,ZOLTAN
BOLLOBAS,BELA
TUZA,ZSOLT
SPENCER,JOELHAROLD
SOS,VERATURAN
HARARY,FRANK*
GYARFAS,ANDRAS
FAUDREE,
RALPHJASPER,JR.
Rank
11
12
13
Score
4.84428
4.82455
4.54946
4.54344
4.25501
3.93599
3.71115
3.67796
3.64732
14
15
16
17
18
19
3.5356
20
Page 11 of 21
Name
Score
LOVASZ, LASZLO
3.50772
SZEMEREDI,ENDRE
3.45698
CHUNG,
FANRONGKING(GRAHAM) 3.45627
PACH,JANOS
3.39278
HAJNAL, ANDRAS
3.13349
NESETRIL,JAROSLAV
3.06854
SCHELP, RICHARDH.
2.98698
SIMONOVITS,MIKLOS
2.95897
KOSTOCHKA,
ALEXANDRV.
2.85383
BABAI, LASZLO
2.84392
Table 5: Rank of Mathematicians in Erdos1 network
4
Results
4.1
4.1.1
Influence Analysis in Erdos Network
Algorithm and Results
Algorithm 1 Erdos Network Analysis
1: Construct adjacency matrix A
2: for i=1 → n do
3:
for i=j → n do
4:
if (both i and j has collaborated with Erdos more than twice, and i and j has ever
collaborated)
5:
Aij = 2
6:
end for
7: end for
8: Use Bonacich’s model to compute influence vector with β = 0.01 and R = A
We list the top rank 20 mathematicians in Table 5 the following table.
4.1.2
Remarks and Comments
Some arguments are in order to give better explanation to our algorithm.
• Why choose β = 0.01
By computation we find that the spectrum radius of R is 36, following the sensitivity analysis
1
in the next section, to make all entries In the influence vector positive, it is necessary that β ≤ 36
.
The relationship in researchers’ network is positive, so we take β = 0.01 (By sensitivity analysis
in the next section we know that this is reasonable). This value makes the series convergent.
To argue the rationality of β (it is not so small that little impact is made from it in the equation).
We consider the Jordan standard representation of R:
11
On the Generalized Pagerank Model
Page 12 of 21
R = P −1 JP
Then the formula can be written into
C(α, β) = αP −1 (J + βJ 2 + · · · )P 1
It can be observed that although β is small, the ratio of norms between consecutive terms are
approximately 0.36, which is not negligible.
• The Construction of Relationship Matrix
Since Erdos is the real centre of this network, i.e. all researchers has collaborated with Erdos.
We make the following plausible hypothesis.
• A pair of researchers that have collaborated more than once would be familiar with each other.
• A pair of researchers that have both collaborated with Erdos more than once, and they have
ever collaborated. Then they would be familiar with each other.
This hypothesis guides us to break the uniformity in weight of relationship in the following
manner: We give such pair a doubled weight in their relationship, namely Aij = Aji = 2.
• Main Idea Extracted from Our Model
Our main idea is that, when faced with a particular network, one should make characterize it
features, construct proper weighted relationship matrix and then apply Bonacichs model to compute
influence. It is our belief that well-characterized model will produce good performance.
However, in this problem we are just given the adjacency matrix and the times of collaboration of
each researcher with Erdos. Further description on the relationship is lacking. Our model in section
3 gives a method to deeper explore the relationship without knowing any other information. It is not
used analyzing this problem. The reason is twofold. Approximation Computation is not successful
and we are more or less given data about the relationship, namely the number of collaboration with
Erdos.
4.2
4.2.1
Influence Analysis in Film Actor Network
Algorithm and Arguments
We choose the field of movie actors to implement our algorithm. Our data is directly downloaded
from website, which is extracted from IMDB (Internet Movie Database). To collect the data, we
first maintain a list of 500 famous actors (the influence is computed according to movie 2006 and
2007 that the actor was in, using the following method.) We then search all the collaboration
between these 500 actors to determine its relationship matrix.
Our main idea for designing the algorithm is to change the trivial relationship matrix (R = A)
used in pagerank model into a non-trivial one. To do this, we take “pairwise evaluation” into
account to get the influence matrix. This is because the “pairwise evaluation” is the main character
of co-authorship network. In movie-actor field, we investigate another important character in the
following way.
Recall that the influence coefficient (entries in M) measures the degree of influence between
researchers. Here the co-efficient should measure the degree that two actors get to know each
other. Thus, we need to investigate the rule of determining the degree of familiarity.
12
On the Generalized Pagerank Model
Page 13 of 21
We can make the following two reasonable hypothesis:
• The familiarity between a pair of actors in a movie will decrease if the total number of actors
in this movie is increased.
• Among actors in a movie, the familiarity between every pair is nearly the same.
Seemingly, the second hypothesis is reasonable but still not convincing enough. In fact we lack
data to measure familiarity in every pair, which is just the same case as problem (2): We lack data
to measure the influence between a pair of researchers that have co-authored. What we know is
just whether or not they have ever co-operated.
Following the argument above, our algorithm is given below. The set of all selected actors is
denoted by A = {a1 , · · · an }. Let movies be represented by F1 , · · · , Fk , where Fr includes all the
main actors (not necessary the selected actors) in that movie. Let y(Fr ) be the year when movie
Fr is published. Let actors be represented by a1 , · · · , an . The familiarity matrix is denoted by
symmetric matrix M , with mij = mji ∈ R+ being the degree of familiarity between actor ai and
aj .
Algorithm 2 Actor-Rank
1: for i=1 → n do
2:
for j=1 → n do
3:
mij = 0
4:
end for
5: end for
6: for r=1 → k do
7:
for all (s, t) ∈ Fr , s 6= t do
1
8:
mst + = mts + = (2008−y(F
r ))|Fr |
9:
end for
10: end for
11: Use Pagerank formula with R = M get the ranking vector
4.2.2
Rank
1
2
3
4
5
6
7
8
9
10
Results and Comments
Name
McKeown, Denis
Stone, Sharon
Lowe, Crystal
Baumel, Shane
Cage, Nicolas
Sykes, Wanda
Castro, Mary
Koechner, David
Lang, Michelle (V)
Mann, Danny (I)
Influence Factor
2.66057
2.64451
2.56269
2.54767
2.52578
2.33047
2.31565
2.31457
2.27214
2.2532
Rank
11
12
13
14
15
16
17
18
19
20
Name
Voronina, Irina
Scott, Codie (I)
Tatasciore, Fred
Campbell, Adam (IV)
Halse, Jody
Summers, Stewart
Kebbel, Arielle
Wood, Elijah
Caudle, Dr. Melissa
Gyllenhaal, Maggie
Table 6: Top 20 movie stars
13
Influence Factor
2.2352
2.22132
2.18808
2.15699
2.14141
2.03297
2.00451
2.00262
1.97262
1.92831
On the Generalized Pagerank Model
Page 14 of 21
We summarize the results for top 20 movies stars in Table 6.
Some analysis are in order based on our results.
On one hand, our result is convincing due to the following data:
Stone, Sharon was nominated for an Academy Award for Best Actress and won a Golden Globe
Award for Best Actress in a Motion Picture Drama for her performance in Casino. Cage, Nicolas
received an Academy Award, a Golden Globe, and Screen Actors Guild Award. Lowe, Crystal is
known for her scream queen roles as Ashlyn Halperin in Final Destination 3.
On the other hand, we can get surprised by part of our result. A vivid example could be that
Baumel, Shane is in top 5 even if he is just a child. It might be because he took part in 7 movies
in 2006, more than most other actors. And some famous actors including Leonardo DiCaprio are
ranked 100+, which seems not reasonable.
One reason for the result to be surprising is that for every actor in the same movie, they get
exactly same influence from it.This hypothesis will lead us overestimate the influence of many minor
actors.This, property will get the persons who attend in more movies to be more influential, but
not focus on the importance of their role in the movie.
Another reason is that we give all movies the same ability of bringing lifting the influence of
actors. However, this is not convincing in the real life. Take Leonardo Dicaprio as an example, the
number of movies starring by him is not that large, which causes this famous actor to be ranked
low. But most of his movies are well-appreciated and influential. Unluckily, the quality of movie is
not measured in our model.
Compared with our work in Erdos network, we will find that our model gives better performance on researcher network than actor network. We believe that the main reason is that we
does not know who is the major actor in a film. But in theoretical research, one must has deep
understanding of his question if he is one of the authors of a paper. Thus, co-authorship implies
strong connection between researchers. However, cooperativeness in movie does not imply concrete
connection between actors.
4.3
4.3.1
Influence Analysis of Fundamental Papers
Algorithm and Arguments
Since paper does not have “individual emotion”. Pairwise Evaluation model is not suitable
here. However, standard Page-rank algorithm seems to give good performance here because they
are designed to evaluate the importance of network, which has a lot in common with influence of
papers. We make small changes to Pagerank to get our algorithm below.
Initially we have 16 papers on the list, we number them from 1 to 16 according to the order in
the list. If a paper is denoted by s, then the age of a paper s (namely the number of years from
publication to 2014) is denoted by y(s). let c(s) be average number of citations per year of paper
c, namely the number that paper s is cited by others divided by the age of this paper. Let m(t) be
the number of papers within 16 selected papers that cite paper t.
w(t) is defined to be the weight (namely, influence) of certain paper.
We finally rank all the papers by its weight. The paper with more weight is considered to be
more influential.
Some arguments are in order to give explanations to the changes we made in origin pagerank
model.
14
On the Generalized Pagerank Model
Algorithm 3 Adjusted Paper-rank
1: for t = 1 → 16 do
2:
Construct set CITt = {s|s cites t}
3: end for
4: for all leaf nodes s do
5:
w(s) = c(s)
6: end for
7: for all internal nodes t do
8:
w(t) = 21 c(t)
9:
for all child node s of t do
1
10:
w(t)+ = 2m(t)
w(s)
11:
end for
12: end for
Page 15 of 21
• Citation is necessary
Lack of the capacity of drawing large amount of data, our network is relatively small. Since it
is also a DAG. If we does not add citation to its weight, the weight of a lead node will be 0, which
is unreasonable and also bad for further analysis of origin papers. In that case all analysis becomes
meaningless. Thus, some “basic” weight must be added to leaf node. Citation is without doubt
the best and general choice.
• Taking average of weight is necessary
To be specifically, when we want to compute the weight of a certain node, the formula is:
m(t)
1
1 X
w(v) = (c(v) +
w(ui ))
2
m(t)
i=1
where ui represents the selected papers that cite paper v. We argue that taking the average
of weight is necessary. Directly taking sum will make the weight of a child not able to beyond his
father node, which is not the case in real life. On the other hand, taking the average can reduce
the instability brought by selecting such a small group of papers.
• Timeline is necessary.
Some papers were published recentlywhile others were published thirty years ago. Suppose that
the number of citation increase uniformlyit is better to divide the number of citation by its age
(number of years till now since published).
4.3.2
Results
Our result for ranking fundamental papers is summarized in Table 7.
5
5.1
Essence and Understanding of Modeling Influence
Science and understanding of modeling influence within a network
15
On the Generalized Pagerank Model
Rank
Article Name
1
Statistical mechanics of complex networks
2
Collective dynamics of small-world’ networks
3
Emergence of scaling in random networks
The structure of scientific collaboration networks
4
5
Scientific collaboration networks: II
6
On Random Graphs
7
The structure and function of complex networks
On properties of a well-known graph
8
9
Navigation in a small world
dentity and search in social networks
I0
11
Power and Centrality: A family of measures
12
Networks, influence, and public opinion formation
13
Models of core/periphery structures
14
Identifying sets of key players in a network
15
Social network thresholds in the diffusion of innovations
16
Statistical models for social networks
Page 16 of 21
Score
1034.63
985.702
864.223
623.007
575.891
558.505
531.117
482.774
400.973
300.35
210.765
97.1429
39.5
38
27.6667
11
Table 7: Rank of 16 papers
We use a random process model to evaluate influence within co-authorship network, film-actor
network and citation network. Some understanding is in order for general methods to model
influence within a network.
Based on what we have done, we think that the main idea of modeling influence is twofold:
Make clear which particular property of influence you really care about and simulate or assign it
mathematically on the graph (using random process) to obtain a measurement.
First we should know what influence is. According to definition on (5), influence is the ability
to alter or sway an individual’s or a group’s thoughts, beliefs, or actions. However, these abilities
are hard to measure from a general point of view. It is known that in particular social networks
influence can be measured practically (6). But from theoretical perspective, (1) tells us that we
may not be able to explicitly model the process of persuading others to change their behavior,
especially when we do not have all of the necessary data in one place. What should we do then ?
To simplify the definition of influence from mathematical point of view might be a good choice.
Problem setting is rather clear: a graph with simple edges. Then how should we make the
definition? It depends on what information we want to extract from the graph. Following this idea,
classical definitions are made in a deterministic manner. For example, when we think that a vertex
is influential if it is near than most of other points, closeness centrality is a good mathematical
measurement; when we agree that vertex is influential if most paths linking another pair of nodes
pass this vertex, betweeness centrality is a good theoretical tool. However, this is not the case for a
co-authorship network. We insist that the influence within a co-authorship network is the capacity
to spreading the information, i.e. a researcher is influential if whenever he came up with an idea
or a problem, most of other researchers would know it, and even follow it.
To measure the capacity of spreading information, Bonacich [1] raised an outstanding model. He
defined a random process and claim that the expected size region that received the message should
be a good measurement of influence for individuals. This is also intuitively fantastic and of great
16
On the Generalized Pagerank Model
Page 17 of 21
importance. It guides us to simulate mathematically what we care about on the graph. Random
process is the essential tool for this methodology and so does our work. What is original in our work
is that we established a new framework to compute relationship matrix, different from Bonacich’s
method, which just used trivial adjacency matrix. Our work is to some extent better because we
take “pairwise evaluation” into account, which characterizes the feature of co-authorship network.
Methodology of random process is fundamental and effective. It can also be observed from the
fact that pagerank algorithm ruled the searching engine for such a long time. People even said that
the main formula for pagerank is the wealthiest formula among all.
Thus, making clear what property you really care about in graph is just letting your argument
make sense mathematically. Simulating it (through random process method or else) on graph is
making your analysis make sense mathematically. From our perspective, these are the two key
points in modeling influence within a network.
5.2
5.2.1
Analysis on Individual Strategy
Experiment and Results
Familiar with most influential persons in a network, one can adopt some strategy to boost
his/her influence rapidly. We come up with some strategies and design experiment to check the
performance. Here are the strategies that will be beneficial to a new-comer.
• Strategy 1. Collaborate with some most influential researchers.
• Strategy 2. Collaborate with one of the most influential researcher and close collaborators of
him.
In a network where the most influential persons are in different connected component or weakly
connected component (just as Erdos Network), two strategies shown above are completely different.
However, in Erdos network, two strategies are nearly the same.
We design the following experiment to check the performance of two strategies.
We first add a new node s to origin Erdos network. This node represents a new researcher who
has just entered this network. Due to the limit of time and resources, this researcher is allowed to
collaborate with T other researchers. Using different strategies to add edges to other nodes will
bring us different graphs. We compute the influence coefficient of this new node. It is necessary to
include a trivial strategy: the new-comer just randomly choose his collaborator. To be specifically,
three strategies are stated below:
• Strategy 1. The new nodes links T most influential nodes in origin graph.
• Strategy 2. The new nodes links the most influential node and (T − 1) most influential
neighbors of it in origin graph.
• Strategy 3. The new nodes uniformly random choose T nodes in origin graph to be its
neighbors.
Since strategy 3 is non-deterministic, we repeat the experiment 100 times to get the mean
influence measurement. For different T , the results are given below. The integer in the table
represents the average ranking of the new comer.
17
On the Generalized Pagerank Model
T Strategy 1
6
133.0000
9
92.0000
12
70.0000
47.0000
18
Page 18 of 21
Strategy 2
133.0000
92.0000
71.0000
48.0000
Random Strategy
234.5000
169.0200
129.1100
86.2000
Table 8: Sensitivity to an Extra Vertex
5.2.2
Analysis and Comments
It can be observed that strategy 1 or 2 cannot give the new-comer far larger influence than
random strategy. In Erdos network, the most influential nodes are adjacent, and thus strategy 1
and 2 has approximately the same performance. With the increase of the number of new edges,
strategy 1 begins to take the lead.
We can conclude from the experiment that it is beneficial to choose collaborators according to
how influence they are. Although there is a flaw in our model: We are using algorithm based on
certain influence measure to compare the performance of strategy 1 and 2, and meanwhile the main
idea of strategy 1 and 2 are to choose your collaborators to be influential persons according to the
same influence measure. It is to some degree a cyclic proof. However, this is reasonable as long as
the influence measure is good, which has been proved in previous sections.
It is indeed reasonable and beneficial to use network analysis for lifting one’s influence. As
indicated by this experiment and result. Collaborating with the most influential researchers in
particular field is helpful.
6
Sensitivity Analysis, Strength and Weakness
6.1
6.1.1
Sensitivity Analysis
Experiment and Results
Our sensitivity analysis mainly includes the following three parts. In every part we add a
perturbation to parameter or the structure, and then observe the performance influence vector. It
turns out that our model is stable, and thus robust.
• Sensitivity to an Extra Vertex
In the first part we add a new vertex into Erdos1 network. We also add 6 more edges, and
the other end of these edges are chosen uniformly random from origin 511 nodes. 6 is the average
number of edges in previous graph.
For new graph G we compute the influence using our model and algorithm, and rank the
researchers according to their influence coefficient. We then measure the difference between this
ranking and origin ranking. To be specific, some notions are in order.
Let t1 , ·, tn be previous ranking of Erdos network, i.e. ti represents the researcher ranked tth .
In the perturbed network we let q(t) represents the new ranking of researcher t. A measurement
18
On the Generalized Pagerank Model
m E[d(m)]
30
0.0200
100 0.0485
511 1.2638
Page 19 of 21
Max d(m)
0.2000
0.2000
1.8885
Min d(m)
0.0000
0.0000
0.6732
Table 9: Sensitivity to an Extra Vertex
m
30
100
511
E[d(m)]
3.5268
3.5470
3.5267
Max d(m)
4.7515
4.9393
4.8611
Min d(m)
2.3875
2.3209
2.1331
Table 10: Sensitivity to an Extra Vertex
of difference of two ranking could be:
m
1 X
|q(ti ) − i|
d(m) =
m
i=1
This measurement mainly tells us the difference of top m researchers, which is the most important for a kibitzer. We repeat the experiment 100 times (In one experiment we randomly draw 6
researchers out of 511 to be the neighbor of new node.) and figure out the mean and the variance
of d, for m = 30, 100, 500. The result is listed below.
• Sensitivity for Extra Edges
In the second part of sensitivity analysis we does not change the vertex set of G. Instead we
randomly add edges into the network. The pair of every edge is uniformly random chosen from 511
nodes. If there is already an edge between a chosen pair, we simply increase its weight by 1 unit
(Let Aij + = 1) We add a total of 30 edges (approximately 2 percent of origin number of edges)
and then measure the difference of influence vector using the same methodology as in part 1. After
100 time repeated experiment, the result is given below:
• Sensitivity of Perturbation in Parameter
In the third part we simply perturb the value of the parameter β. The experiment is done in
the following three cases.
m
30
100
511
E[d(m)])
2.7667
7.8900
17.7691
V ar[d(m)]
9.7023
70.3413
337.8211
m
30
100
511
(a) β: 0.01 → 0.015
E[d(m)])
1.4667
13.9693
7.9726
V ar[d(m)]
2.5333
3.5200
78.3679
(b) β: 0.01 → 0.02
Table 11: sensitivity in β
19
On the Generalized Pagerank Model
m E[d(m)]) V ar[d(m)]
30 343.7667
18137
100
276.27
21175
511 175.3933
20284
m
30
100
511
(a) β: 0.30 →0.31
E[d(m)])
355.2000
279.7000
179.6830
Page 20 of 21
V ar[d(m)]
22611
24773
18864
(b) β: 0.40 → 0.41
Table 12: sensitivity in β
6.1.2
Analysis and Comments
From experiments in previous section we obtain further analysis and comments of results in
sensitivity analysis.
• When β does not change, small changes to structure of graph cannot cause huge difference
in the influence measurement, i.e. our model is not sensitive to the perturbation on structure of
network.
• When β does not change, the smaller m is, the smaller d(m) is, namely smaller top group is
more stable when the perturbation is on G’s structure.
• Sensitivity to parameter β is a little bit complex.
P
k k+1 1
If the variation of β is in reasonable region (all β in this region makes the series ∞
k=0 β R
convergent), small perturbation of β brings small change in influence measurement.
However, when
P
k Rk+1 1 divergent,
β
the variation of β is dangerous, i.e. some value of β will make the series ∞
k=0
the model gives very unstable output as indicated by out results of experiment. Perhaps Bonacich
should add one more condition to his methodology: β should be less than the spectrum radius of
R. The reason is large β will destroy stability of model.
6.2
Strength and Weakness
Strength of our model is discussed several times in previous section. We make a summary here:
• Originality
We raised the model that has not appeared before. The most influential previous work on modeling collaboration network might be Bonacich’s influence measures, and the Pagerank algorithm
which follows Bonacich’s model. We make changes to this model, changing the trivial “relationship
matrix” R into a more specific one M , taking both real influence and virtual influence into account.
This change is cased on reasonable hypothesis and the main feature of researcher’s network.
• Characterize the Features of Certain Networks
Two main changes are made in Pagerank model due to the following crucial observation about
the network.
First, the uniformity of the probability of transmitting a message is omitted. We replace the
relationship matrix by a weighted influence matrix, computed from a matrix equation, given by a
well-defined random process.
Second, pagerank model only allows individuals to send the message to his neighbors, which is
also omitted here. In our model, the GlobalSpread phase allows this to happen.
In analysis of film actors and fundamental papers, crucial observations also offer improvement
in models, and thus better performance is obtained.
20
On the Generalized Pagerank Model
Page 21 of 21
• Mathematical Simulation
Random Process is a powerful tool in analyzing abstract definition on graph. To be specific,
when we want to measure some kinds of ability, we can define certain “amount of information” and
use random process to do computation. Random process is also well studied in theory. Several
methods such as Brown Motion and Markov Process becomes successful in application in industry.
Our model has some weaknesses, though. They are analyzed below.
• Time Complexity
Recall that our main equation is the following:
α2 V 2 + (α1 R − I)V + I = 0
Given adjacency matrix A we can find R immediately. However, solving V is a hard problem.
Since no analytical techniques are developed to solve matrix equation with high degree efficiently.
It is N P -hard from theoretical computer perspective.
• Cannot Guarantee a Good Approximation Solution for V
Here for data analysis, we can only use approximation methods to obtain a pseudo-solution,
which is given by the following formula:
f (V ) = α2 V 2 + α1 RV + I
Recurrence method is then employed, which is specified in model section.
However, f is not a contractive mapping for all pair of V . If we want convergence in recurrence
method. α1 and α2 need to be small enough, which reduce the flexibility of the model. Even if we
apply in small α1 and α2 , the recurrence method can still give us divergent solution.
These two mathematical flaws are important. Further work on solving matrix equations analytically or obtaining approximation solution is in need.
References
[1] Phillip Bonacich. Power and centrality: A family of measures. American journal of sociology,
pages 1170–1182, 1987.
[2] Mark EJ Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and
centrality. Physical review E, 64(1):016132, 2001.
[3] Mark EJ Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404–409, 2001.
21
Variants of Prophet Inequalities
Zihan Tan; Fu li
1
Introduction
The classic prophet inequality problem is that given a class C of random variables X = (X1 , X2 , · · · ),
ﬁnd the universal inequalities valid for all X in C which compare the expected supremum of the
sequence with the optimal stopping value of the sequence. To be speciﬁc, if M denotes the expected
supremum:
M = M (X) = E[sup Xn ]
n
and V denoted the optimal stopping value (over the set T = T (X) of stopping rules for X)
V = V (X) = sup E[Xt ]
t∈T
If the random variables are independent and only take non-negative values, there have been a
celebrated result:
V ≤ M ≤ 2V
And it is shown that this bound is tight.
In this research report, we study two alternatives of this problem (mentioned in the survey [1]),
called the time-average payoﬀ and time-discount payoﬀ.
Problem 1.
∑
If Y1 , · · · Yn are i.i.d r.v’s taking values in [0, 1], and Xj = 1j ji=1 Yi , then what is the advantage
of M over V ? In other words, ﬁnd the minimal value k such that:
V ≤ M ≤ kV
Problem 2.
If Y∑
1 , · · · Yn are i.i.d r.v’s taking values in [0, 1] and let 0 < α < 1 be the discount factor, and
Xj = ji=1 αj−i Yi , then what is the advantage of M over V ? In other words, ﬁnd the minimal
value k such that:
V ≤ M ≤ kV
1
In the following sections, we show our results for these two problems respectively. For every
problem, we ﬁrst analyze the case that n = 2 and ﬁnd the optimal value for k, and also show
that it is hard to ﬁnd the value of k for n ≥ 3 since the distribution can be arbitrary. And with
some additional argument we conjecture that the optimal advantage k for n = 2 is exactly the
optimal advantage for arbitrary n. Then we focus on what is the best stopping rule when given
the distribution for all Yi ’s. Finally we do some computation on the uniform distribution case, i.e.
Yi ’s are uniformly distributed in [0, 1].
2
Time-Average payoﬀ
2.1
Analysis for 2 Random Variables
Assume n = 2, let p(·) be the probability distribution for Y1 and Y2 . Then the best stopping
rule should be a threshold strategy. To be speciﬁc, we have the following proposition:
Proposition 1. (Threshold Strategy)
The best stopping should be of the following form:
Probe Y1 , if it is larger than the value T (previously ﬁxed with respect to p(·), in fact T = E[Y ]),
then we just stop at Y1 , otherwise we continue to probe Y2 .
Proof.
In fact any stopping rule can be written as a (probably randomized) projection f : [0, 1] → {0, 1},
here the variable is the value of Y1 , and after observing value of Y1 , the rule should make a decision
whether or not it will continue to probe Y2 . The projection could be randomized since the strategy
could be randomized. And f (x) = 0 indicates that it stops at Y1 , while f (x) = 1 indicates that it
will probe Y2 .
This randomized projection could be indeed written as a distribution of deterministic projection.
Thus, it suﬃces to show that the threshold strategy is the best among all deterministic strategies.
In the following let f be a deterministic projection, and let E = E[Y ].
∫
∫
1
E[payoﬀ of f] =
yp(y)dy +
(y + E)p(y)dy
2
y:f (y=0)
y:f (y=1)
Thus, it is clear that if y ≥ 12 (y + E), we should set f (y) = 0 and otherwise we should set
f (y) = 1. This completes the proof.
We formally state our result of n = 2 case as the following theorem.
Theorem 1.
If Y1 , Y2 are i.i.d r.v’s taking values in [0, 1], and for j = 1, 2, Xj =
√
V ≤ M ≤ (6 − 2 6)V
2
1
j
∑j
i=1 Yi ,
then
Proof.
From previous proposition for best stopping rule, we obtain the following expression for V .
V =
∫
1
E
yp(y)dy +
∫
0
E
1
1
1
(y + E)p(y)dy = E +
2
2
2
∫
1
E
1
yp(y)dy + E
2
∫
E
p(y)dy
0
For stopping rule with complete foresight, it is clear that it should stops at Y1 if y1 gey2 , and
continue to probe Y2 otherwise. Since the probability for y1 ≥ y2 is 21 (this is true for all continuous
p(·), and this is also true with some additional argument when the distribution is discrete) we have
the following expression for M :
1
1
X = E + E[max{y1 , y2 }]
2
2
From [1] we have the following dilation lemma:
Lemma 1. (Dilation Lemma)
Let X be any integrable r.v and −∞ < a ∫< b < +∞, (X)ba is a r.v. satisfying (X)ba = X if
X∈
/ [a, b], (X)ba = a with probability (b − a)−1 X∈[a,b] (b − X)dP (x), and (X)ba = b with probability
∫
(b − a)−1 X∈[a,b] (X − a)dP (x), this (X)ba is called the dilation of X on interval [a, b], and the
following two properties hold.
(1)[X] = [(X)ba ].
(2)If Y is any r.v independent of both X and (X)ba , then E[max{X, Y }] ≤ E[max{(X)ba , Y }].
(Y1 ,Y2 )
Since we want to ﬁnd the maximal advantage ratio supp(·) M
V (Y1 ,Y2 ) , and V depends on three
∫1
∫E
values: E, E yp(y)dy and 0 p(y)dy. We just need to focus on those p(·) that maximize the ratio
when the three values are ﬁxed.
According to dilation lemma, such near optimal p(·) should be discrete and of the following form:
(ϵ in the following is a suﬃciently small positive value, just to make the resulting distribution has
the same three values as before and is omitted in computation.)
p(0) = a; p(E − ϵ) = b; p(E + ϵ) = c; p(1) = d; a + b + c + d = 1; (b + c)E + d = E;
Thus, it remains to ﬁgure out the maximal advantage ratio as a function of a, b, c, d.
d
Since a + b + c + d = 1 and (b + c)E + d = E, we have E = a+d
.
E[max Y1 , Y2 ] = [1−(1−d)(1−d)]·1+[2(b+c)(1−d)−(b+c)2 ]·E = (2−d)d+(1−a−d)(1−d+a)E
1
1
1
1
1
V = E + (cE + d) + E(a + b) = E(2 − d) + d
2
2
2
2
2
ratio =
1
E + 12 ((2 − d)d + (1 − a − d)(1 − d + a)E)
X
a(1 − a − d)
= 2
=1+
1
1
V
2+a
2 E(2 − d) + 2 d
It remains to maximize a(1−a−d)
such that a, d ≥ 0 and a + d ≤ 1. It ie clear that we should
2+a
set d = 0, and let t = a + 2,
√
a(1 − a − d)
−t2 + 5t − 6
6
=
≤ 5 − (t + ) ≤ 5 − 2 6
2+a
t
t
3
And √
it can be seen that we can set a, b, c, d appropriately such that this ratio is arbitrarily close
to 6 − 2 6.
√
Thus, k = 6 − 2 6, this completes out proof.
Remark 1.
This methodology does not work for case n ≥ 3, since the expression of X is not clean and thus
hard to analyze when the distribution p(·) is arbitrary.
According to WLLN, we know that as n goes to inﬁnity, Xn is highly concentrated around E,
and the speed of convergence is given by Chernoﬀ’s Bound. Thus, if a strategy allows to stop at
large index with certain probability, then its expected payoﬀ should be close to E. For example,
the following strategy will give us expected payoﬀ approximately E:
If Xi ≤ E, then continue to probe Yi+1 , and if Xi > E. we stop at Yi .
Let kn be the maximal ratio in the case that the number of Y ’s is n.
Thus, it can be observed that the maximal ratio should not be sensitive when n is suﬃciently
large, and it should converge to come value.
On the other hand, it is intuitive that the more random variables we have, the less advantage
that the prophet should have, since the payoﬀ is the average value of ﬁrst t values and the additional
random variable does not give prophet in the case n is large as much advantage as in the case n is
small.
Therefore, the maximal ratio should appear in the ﬁrst a few terms of {kn }+∞
n=1 and the sequence
should converge. Based on this observation, we propose the following conjecture.
Conjecture 1.
If Y1 , Y2 · · · are i.i.d r.v’s taking values in [0, 1], and Xj =
√
i.e. kn ≤ 6 − 2 6 for all n ≥ 3.
2.2
√
V ≤ M ≤ (6 − 2 6)V
1
j
∑j
i=1 Yi ,
then
Best Strategy
Since it is hard to explore the optimal advantage ratio when n ≥ 3, we turn to explore what the
best stopping rule should be. In case n = 2 we have proved that the best strategy is a threshold
strategy with threshold equal to E[Y ], we will prove similar results for n ≥ 3 case in this section.
First we give formal deﬁnition of a strategy. A strategy is deﬁned to be a (possibly randomized)
projection f : [0, 1]n → [n], namely f (y1 , · · · , yn ) = t means that when receiving the sequence
realization of random variables y1 , · · · , yn , the strategy should stop at yt . Since the strategy is
not equipped with foresight, assume that it is deterministic, then the following property should be
satisﬁed:
′
′
if f (y1 , · · · , yn ) = t, then ∀yt+1
, · · · , yn′ ∈ [0, 1], f (y1 , · · · , yt , yt+1
, · · · , yn′ ) = t
which means that whether or not f (Y ) takes value t only depends on the ﬁrst t coordinates of
Y . Thus, we can deﬁne another function: gt : [0, 1]t → {0, 1}, where gt (y1 · · · yt ) = 0 means that
the strategy will stop at Yt , and gt (y1 · · · yt ) = 1 suggests that the strategy will go on to probe Yt+1 .
4
Note that gt is only deﬁned on all sequence (y1 , · · · , yt ) where ∀0 < i < t, gi (y1 , · · · , yi ) = 1. It is
clear that f is equivalent to a series of {gi }ni=1 .
Since a randomized projection is indeed a probability distribution of deterministic projection,
it suﬃces to ﬁgure out the best deterministic stopping rule.
We have the following propositions for best strategy. In the following we suppose probability
distribution for Yi is continuous. But in fact the proposition is also true with additional speciﬁcation
when distribution is discrete.
Proposition 2. (Threshold Strategy)
In every round, f should be a threshold strategy. To be speciﬁc, if (y1 , · · · , yt ) satisfy ∀0 < i <
t, gi (y1 , · · · , yi ) = 1, then there exists a threshold Tt , such that gt (y1 , · · · , yt ) = 1yt ≥Tt , where 1yt ≥Tt
is the indicator function of event yt ≥ Tt .
The proof is straightforward, and similar to proposition 1.
Proposition 3. (Uniform Threshold Strategy)
f should be a uniform threshold strategy. To be speciﬁc, there exists T1 · Tn−1 such that for all
(y1 , · · · , yt ) satisfying ∀0 < i < t, gi (y1 , · · · , yi ) = 1, gt (y1 , · · · , yt ) = 1yt ≥Tt −∑t−1 yi , where 1E is
i=1
the indicator function of event E.
The proof is straightforward, and similar as proposition 1.
Proposition 4. (Decreasing Thresholds)
The uniform thresholds T1 · · · Tn−1 should satisfy the following decreasing property:
T1 ≥
2.3
T2
T3
Tn−1
≥
≥ ··· ≥
= E[Y ]
2
3
n−1
Analysis for Uniform Distribution
For getting more properties of the problem, we continue our exploration with restring the distribution of every random variable Yi as the uniform distribution.
Concerning n random variables Y1 , · · · , Yn , let f : [0, 1]n → [n] be a strategy without foresight.
Then if n random variables Y1 , · · · , Yn take the values y1 · · · yn , denote the value when the strategy
f stops on the Yf (y1 ,...,yn ) by Uf (y1 , . . . , yn ). Therefore,
Uf (y1 , . . . , yn ) =
∑f (y1 ,...,yn )
yi
i=1
.
f (y1 , . . . , yn )
And we call this value Uf (y1 , . . . , yn ) as the utility of the strategy f on y1 , . . . , yn .
The following we calculate maxf EY1 ,··· ,Yn Uf (y1 , . . . , yn ), which is the maximum utility
maxf EY1 ,··· ,Yn Uf (y1 , . . . , yn ) where a strategy without foresight f can reach.
In section 3, the best strategy without foresight can be described by n thresholds T1 , · · · , Tn .
Based on the n thresholds, on deciding the strategy whether should stop on Yi , it is enough to know
whether the sum of all the value of Y1 , · · · , Yi is beyond the threshold. Therefore, after Y1 , · · · , Yi−1
already took value, the maximum utility of the strategy only depends on the sum of the value of
random of Yi and all ﬁxed values of Y1 , · · · , Yi−1 .
5
Therefore, for i ∈ [n], let Ui be the function computing the maximum utility when the
Y1 , · · · , Yi ’s values are already given. Then Ui can be decided on one variable ti representing
the sum of Y1 , · · · , Yi . So Ui : [0, i] → [0, 1] is deﬁned as follows:
 ti
ti ≥ T i ;
 i,
Ui (ti ) =

EYi+1 Ui+1 (ti + Yi+1 ), ti < Ti .
 ti
ti ≥ Ti ;
 i,
That is, for 1 ≤ i < n, Ui (ti ) =
for i = n, Un (tn ) = tnn .
 ∫ ti +1
Ui+1 (y)dy, ti < Ti ,
ti
We can prove the following
Lemma 2. maxf EY1 ,··· ,Yn Uf (y1 , . . . , yn ) = Et1 U1 (t1 ).
Proof. Similarly with proposition 1.
Now Un , Un−1 , · · · , U1 are both ﬁxed and can be computed sequentially. But it is too complex
to write the precise representation for general n.
Therefore, we only talk about n = 3, 4 for the beginning.
When n = 3,
t3
U3 (t3 ) =
3
 t2
t2 ≥ 1 ;
 2,
U2 (t2 ) =
∫ t2 +1 y
 ∫ t2 +1
2t2 +1
U
(y)dy
=
3
3 dy =
6 , t2 < 1.
t2
t2
And T2 = 1.
U1 (t1 ) =
U1 (t1 ) =

 t1 ,
 ∫1

 t1 ,
 ∫ t1 +1
t1
2y+1
6 dy
∫ 1+t1
t1 ≥ T 1 ;
U2 (y)dy, t1 < T1 .
t1 ≥ T1 ;
y
2 dy
1 2
= 12
t1 + 13 t1 + 31 , t1 < T1 .
√
1 2
Thus T1 is the root of t1 = 12
t1 + 13 t1 + 13 . T1 = 4 − 2 3 ∼ 0.5358.
When n = 4,
t4
U4 (t4 ) =
4
 t3
t3 ≥ 1.5 ;
 3,
U3 (t3 ) =
 ∫ t3 +1
U4 (y)dy = 2t38+1 , t3 < 1.5.
t3
 t
2
t2 ≥ T2 ;

2,




 ∫
∫ t2 +1 y
2
1.5 2y+1
25
dy + 1.5
dy = y24 + 5y
U2 (t2 ) =
8
3
24 + 96 , 0.5 ≤ t2 < T2 .
t
2





 ∫ t2 +1 2y+1 dy = y+1 ,
t2 < 0.5.
8
4
t2
t1
+
1
6
x
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
y
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Figure 1: The√green line is y/2. And the blue line is the quadratic line. And their intersection
point is 3.5 − 6 ∼ 1.05
√
6. Please see ﬁgure 1 for the intuition of the graph of the above function.

t1 ≥ T 1 ;
 t1 ,
U1 (t1 ) =
 ∫ t1 +1
U2 (y)dy = FU2 (t1 + 1) − FU2 (t1 ), t1 < T1 .
t1
Thus T2 = 3.5 −
Then
∫
Namely FU2 = U2 (y)dy. And

0. < y ≤ 0.5
 0.125y 2 + 0.25y
0.0138889y 3 + 0.104167y 2 + 0.260417y − 0.00173611 0.5 < y ≤ 1.05051
FU2 =

0.25y 2 + 0.126998
1.05051 < y ≤ 2.
Thus T1 is the root of t1 = FU2 (t1 + 1) − FU2 (t1 ).
And we can use computer to get a solution T1 = 0.553772.
3
Time-Discount Payoﬀ
In this section, we study another payoﬀ function of this problem (mentioned in the survey
[HK92])
Problem 3.
7
If Y∑
1 , · · · Yn are i.i.d r.v’s taking values in [0, 1] and let 0 < α < 1 be the discount factor, and
Xj = ji=1 αj−i Yi , then what is the advantage of M over V ? In other words, ﬁnd the minimal
value k such that:
V ≤ M ≤ kV
In the following sections, we show our results for this problem. In subsection 1, we analyze the
case that n = 2 and ﬁnd the optimal value for k, and also show that it is hard to ﬁnd the value of k
for n ≥ 3 since the distribution can be arbitrary. However, in subsection 2 we make some analysis
for inﬁnite stage game for this payoﬀ function. We also focus on what is the best stopping rule
when given the distribution for all Yi ’s. In subsection 3 we do some computation on the uniform
distribution case, i.e. Yi ’s are uniformly distributed in [0, 1].
3.1
Analysis for 2 Random Variables
Assume n = 2, let p(·) be the probability distribution for Y1 and Y2 . Then the best stopping
rule should be a threshold strategy. To be speciﬁc, we have the following proposition:
Proposition 5. (Threshold Strategy)
The best stopping should be of the following form:
Let E be the expectation of Yi . Probe Y1 , if it is larger than the value
Y1 , otherwise we continue to probe Y2 .
E
1−α ,
then we just stop at
Proof.
E
If Y1 > 1−α
, since the expectation of Y2 is E, the expectation payoﬀ of continuing probing Y2 is
αY1 + E. Thus, it is better to continue probing Y2 if and only if αY1 + E ≥ Y1 , which is equivalent
E
.
to Y1 ≤ 1−α
We formally state our result of n = 2 case as the following theorem.
Theorem 2.
∑
If Y1 , Y2 are i.i.d r.v’s taking values in [0, 1], and for j = 1, 2, Xj = ji=1 αj−i Yi , then
2
V
1+α
V ≤M ≤
Proof.
From previous proposition for best stopping rule, we obtain the following expression for V .
V =
∫
1
E
1−α
yp(y)dy +
∫
0
E
1−α
(αy + E)p(y)dy = α
∫
0
E
1−α
yp(y)dy + E
∫
E
1−α
p(y)dy +
0
∫
1
E
1−α
yp(y)dy
For stopping rule with complete foresight, it is clear that it should stops at Y1 if y1 ≥ αy1 + y2 ,
and continue to probe Y2 otherwise. Thus, we have the following expression for M :
M = E[max{αY1 + Y2 , Y1 }] = E[αY1 ] + E[max{Y2 , (1 − α)Y1 }]
8
(Y1 ,Y2 )
Since we want to ﬁnd the maximal advantage ratio supp(·) M
V (Y1 ,Y2 ) , and V depends on three
∫1
∫ E
values: E, E yp(y)dy and 01−α p(y)dy. We just need to focus on those p(·) that maximize the
1−α
ratio when the three values are ﬁxed.
According to dilation lemma, such near optimal p(·) should be discrete and of the following
form:
E
< 1, then
If 1−α
E
E
− ϵ) = b; p(
+ ϵ) = c; p(1) = d;
1−α
1−α
E
a + b + c + d = 1; (b + c)
+ d = E;
1−α
p(0) = a; p(
If
E
1−α
≥ 1, then
p(0) = a, p(1) = 1 − a
d(1−α)
E
Since a + b + c + d = 1 and (b + c) 1−α
+ d = E, we have E = a+d−α
.
E
Considering that it is necessary to compare (1 − α) with 1−α since we are taking the maximum
of two random variable:
Thus, it remains to ﬁgure out the maximal advantage ratio as a function of a, b, c, d in every
case.
E
Case.1 1−α
≥1
The best strategy is to always probe Y1 and Y2 , and E = (1 − a).
V =
∫
1
(αy + E)p(y)dy = (1 + α)E
0
M = a2 · 0 + 2(1 − a)a · 1 + (1 − a)2 · (1 + α) = (1 − a)(1 + α)E + 2aE
We can let a → 1 and ﬁnd that
Case.2 (1 − α) <
p(0) = a; p(
E
1−α
(1 − α)a + (1 + α)
M
=
V
1−α
M
V
→
2
1+α .
<1
E
E
E
− ϵ) = b; p(
+ ϵ) = c; p(1) = d; a + b + c + d = 1; (b + c)
+ d = E;
1−α
1−α
1−α
E
Since a + b + c + d = 1 and (b + c) 1−α
+ d = E, we have E =
V = αE + (a + b)E + (1 − α)(c ·
and a > α.
E
1+a
+ d) = (1 − α)d
1−α
a+d−α
M = αE+E[max{Y1 , (1−α)Y1 }] = αE+d+(1−a−d)
9
d(1−α)
a+d−α ,
E
(1 − α)d
+a(1−α)E =
(1+α+(1−α)a)
1−α
a+d−α
M
2α
=1−α+
V
1+a
Let a → α and the ratio goes to 1 − α +
Case.3
E
1−α
2α
1+α .
≤1−α
V = αE + (a + b)E + (1 − α)(c ·
E
1+a
+ d) = (1 − α)d
1−α
a+d−α
M = αE + E[max{Y1 , (1 − α)Y1 }] = αE + d + (1 − a − d)(
E
(1 − d) + d(1 − α)) + a(1 − α)E
1−α
Through tedious computation we found that in this case we would not obtain a better bound
than previous two cases.
Since
2
1+α
>1−α+
2α
1+α ,
the best bound for advantage ratio is
2
1+α .
Remark 2.
This methodology does not work for case n ≥ 3, since the expression of V is not clean (we would
need to discuss 2n diﬀerent cases and each one has completely diﬀerent formula) and thus hard to
analyze when the distribution p(·) is arbitrary.
3.2
3.2.1
Inﬁnite-Stage Game and Best Strategy
Infinite-Stage Game
Consider the inﬁnite-stage version of the problem: We are given inﬁnitely many independent identically distributed random variable and are allowed to stop whenever we want. We have the
following proposition for the best strategy.
Proposition 6. The best strategy is a uniform threshold strategy. Whenever your current payoﬀ
E
is higher than a previously ﬁxed T∞ , you should stop. And we have T∞ > 1−α
.
Proof. It is easy to see the best strategy is a threshold strategy. To show that it is uniform, it
suﬃces to point out that every round, of your current payoﬀ is X, then after you have another probe
your payoﬀ would be αX + Y , and there are inﬁnitely many choices forehead. This is independent
on the number of probes that you have done. Thus, the threshold should be uniform.
E
To see that it is strictly larger than 1−α
, since the payoﬀ of the strategy ”not to stop” is at
least αX + E, if this is even larger than X, then of course you should not stop here.
10
3.2.2
Best Strategy for Finite-Stage Game
Since it is hard to explore the optimal advantage ratio when n ≥ 3, we turn to explore what the
best stopping rule should be. In case n = 2 we have proved that the best strategy is a threshold
E
strategy with threshold equal to 1−α
, we will prove similar results for n ≥ 3 case in this section.
Since a randomized projection is indeed a probability distribution of deterministic projection,
it suﬃces to ﬁgure out the best deterministic stopping rule.
We have the following propositions for best strategy. In the following we suppose probability
distribution for Yi is continuous. But in fact the proposition is also true with additional speciﬁcation
when distribution is discrete.
Proposition 7. (Threshold Strategy)
In every round, the player should play a threshold strategy.
The proof is straightforward, and similar as proposition 1.
We also state the following proposition without proof, since it is easy to verify.
Proposition 8. (Decreasing Thresholds)
The thresholds T1 · · · Tn−1 should satisfy the following decreasing property:
T∞ > T1 > T2 > T3 > · · · > Tn−1 =
3.2.3
E
1−α
Generality for Threshold Strategy
For this kind of stage-style probing game, it is very likely that the best strategy is a threshold
strategy. To be more speciﬁc, we have the following natural proposition.
Proposition 9.
In every round let your probed value is yi and you stop at yn . If the payoﬀ function f (y1 , · · · yn )
is increasing in every of its coordinates, then the best strategy is a threshold strategy.
3.3
Analysis for Uniform Distribution
In this situation, we calculate the threshold precisely.
Let f : [0, 1]n → [n] be a strategy. By previous section, f can be described by n thresholds
T1 , . . . , Tn .
Then deﬁne n piecewise functions U1 , . . . , Un where each function Ui represent the expected
payoﬀ operating according to the threshold Ti on the given input Y1 , . . . , Yi . Namely
By the similar discussion before, the best strategy f is decided by the optimal thresholds
T1 , . . . , Tn . And each optimal threshold Ti is also locally optimal on the part of the possible
sequence Y1 , . . . , Yi . Therefore Ti should make Ui reach its maximum. Namely
Ti = arg max Ui .
Ti
11
Note that once Ti is ﬁxed, we can represent Ui exactly. Thus, we can calculate the value Ey iid on [0,1] Ui (αX
∑ i + y) and then ﬁnd Ti = arg maxTi−1 Ui . That is, based on the fact
Un (Y1 , . . . , Yn ) = nj=1 αj−n Yj , we can calculate Tn−1 , Tn−2 , . . . , T1 inductively.
Focus on each Ui , we can do some simpliﬁcation. Since now we only want to ﬁnd the Ti
maximizing Ui , namely Ti should be less than Xi when Xi ≥ Ey∼[0,1] Ui (αXi + y) and Ti should
be larger than Xi on the contrary. And by the pervious discussion, we already know Xi and
Ey∼[0,1] Ui (αXi + y) are both monotone and have one intersection in the interval [0, 1]i . Therefore,
∑
Ti is just the intersection, namely the root in [0, ij=0 αj ] of the equation
Xi = Ey∼[0,1] Ui (αXi + y) =
∫
αXi +1
Ui (y)dy.
αXi
Therefore, we formulate the above discussion as an Algorithm1.
Algorithm 1: Computing the best thresholds T1 , . . . , Tn for n variables Y1 , . . . , Yn
1
2
3
4
5
Initially, let Ti = 0, ∀i ∈ [n − 1]; let U (X) = X;
for i from n − 1 to 1 do
∫ αX+1
Let Ti be the root in [0, 1] of X = αX U (y)dy, if can’t ﬁnd one, let Ti = 1;

X ≥ Ti ;
 X,
Update U (X) =
;
 ∫ αX+1
U
(y)dy,
X
<
T
.
i
αX
Return T1 , . . . , Tn−1 ;
Furthermore, we give the example running of the algorithm when n = 3.
When n = 3,
U3 (X) = X,
∫ αX+1
1
ydy = αX + .
X=
2
αX
1
T2 =
.
2(1 − α)

1

X ≥ 2(1−α)
;
 X,
U (X) =

 αX + 1 , X < 1 .
2
2(1−α)

1
αX + 12 ,
X > 2α(1−α)
.





∫ αX+1

1
5−4α
1
1
2
3
2
X=
U (y)dy =
2 ((α − α )X + αX + 4−4α ), 2α(1−α) − α ≤ X ≤

αX




 α2 X + 1+α ,
1
X < 2α(1−α)
− α1 .
2
When 0 < α < 1/2,
1
2α(1−α)
−
1
α
≤0<1<
1
2α(1−α) .
;
Thus we only need consider the equation
5 − 4α
1
).
X = ((α2 − α3 )X 2 + αX +
2
4 − 4α
12
1
2α(1−α)
Thus
T1 =
(
)
√
0.5 α2 − 2 α5 − 3α4 + 2α3 + 2α2 − 3α + 1 − 3α + 2
α4 − 2α3 + α2
.
Observe that when α is among the interval (0, 1/2), the threshold is the Figure as follows.
1.0
0.8
0.6
0.4
0.2
0.1
0.2
0.3
0.4
Figure 2: The red line is T2 =
0.5
1
2(1−α) .
And the blue line is T1 .
References
[1] Theodore P Hill and Robert P Kertz. A survey of prophet inequalities in optimal stopping
theory. Contemporary Mathematics, 125(1):191, 1992.
13
On the Compressed Sensing of Microblog
Zihan Tan
1
Abstract
Microblog is nowadays very popular among people in different regions. Everyday there is a
large amount of information flowing on the microblog-internet, which is of great value for both
theoretical research and economic analysis. There is also a rather important property in such kinds
of networks: Power Law, which is a common property in a great deal of networks with large scale.
We formalized the microblog-internet into graph model and did analysis on compressed sensing
aspects. We designed a model for operating the whole network and give an algorithm to recover
the specific values from large amount of input using only small space.
Key word: Compressed Sensing, Power Law, Networks, Microblog
2
Introduction
In the past decades, the research on network has increased rapidly. The common method
consists of two parts: Modeling the problem into networks and analyzing the property of networks.
These research has a common feature: They all deal with large amount of data. When it comes to
data, the problem is mainly separated into two component: data gathering and data analysis. A
standard method compressed sensing is related to both components.
Research about compressed sensing mainly contains two parts: theoretical part and application
part. Theory part is advanced in numerical analysis. In the survey by Yuan Yao (2009), theory
part is now related to homogeneous spaces and fourier analysis. The goal is to find a set of vectors
to be the sensing base, and get sparse sensing vector, wavelet analysis is also a good tool for this.
However, more research is on application part, because using compressed method can always
give us good data-extracting results and accurate analysis of large-scale data, then let us better
understand the structure and property of specific class of networks. Victor Hugo (1993), with
data courtesy to Knuth, studied network including different types of relationships. Coauthorship
network between scientists was studied by Newman (2006), rounded by Jon Kleinberg later.
The important structure of application research on compressed sensing contains two parts: datagathering and data-analysis. It suffices to design efficient and good algorithms. Xiaoye Jiang, Yuan
Yao, Han Liu and Leonidas Guibas (2011) set up a new framework for modeling and connected
two seemingly different areas: network data analysis and compressed sensing. Fundamental work
on data-gathering is done by Chong Luo, Feng Wu, Jun Sun and Chang Wei Chen (2009).
Our research concerns network of microblogs, and analyzes the property of forwarding of microblogs. The network is of power law, and we both initiate the method of data-gathering and
1
design the algorithm for sensing and recovering, then give informal proofs for the correctness of our
model.
3
Model Setting
Consider a microblog network as a general directed graph with the number of nodes to be n.
Let every node represents a specific client, and there is an edge from node u to node v iff the client
u friended client v, or followed v. Every node u is accompanied with a vector pu , representing his
persinality.
One microblog x is also accompanied with a vector mx , with the same pattern as the personality
vector, representing the feature of this microblog. But the vector is going to be changed in the
procedure of being forwarded. We say that a diffusion φx of a microblog x, is a set of 3-tuples:
(i, u, mxu,i ), meaning that the client u is the ith person to forward the origin microblog x, and after
his forwarding, the vector of x is changed to mxu,i .
There is also a family of matrices called transition matrices, representing the essential formula
of the way to change the vector of a microblog: {T i }ni=1 , T i is a matrix of n rows and n columns,
representing every client respectively.
The transition f ormula is the following: when ith person u see the forwarding microblog x
from (i − 1)th person v, with the vector already changed to mxv,i−1 , then after he forwards this
mocroblog, the vector of microblog x changes into
mxu,i =
i
1
Tuv
x
m
+
pv
v,i−1
i
i
1 + Tuv
1 + Tuv
(i.e.· · · )
4
Algorithm
The input is large amount of information on all personality vectors and a lot of diffusion information. We are allowed to use only a small amount of space (i.e space(o(n2 ))). And we need to
output all the vectors using compressed sensing method.
Our algorithm can be divided into three parts: First, we memorize the transition matrices.
Then we use the transition matrices to get the sensing information. Next we use specific algorithm
of compressed sensing to recover all the vectors.
4.1
Memorizing the Transition Matrices
We memorize the transition matrices using standard power iteration method, with some techniques.
Given a matrix T i not symmetric, we first change it into two symmetric components and
memorize them respectively: Let Li and Ri be the following matrices:
2



Li = 




Ri = 

i
T11
i
T21
..
.
i
T21
i
T22
..
.
···
···
..
.
i
Tn1
i
Tn2
..
.
i
i
Tn1
Tn2
···
i
Tnn
i
T11
i
T12
..
.
···
···
..
.
i
T1n
i
T2n
..
.
i
i
T1n
T2n
···
i
Tnn
i
T12
i
T22
..
.

(1)

(2)








We memorize Li and Ri in the following form: memorize the log n eigenvalue and their correspondence eigenvector, namely the log n pairs (λj , tj ). Finally we concatenate two parts together
to be the approximated matrix T 0i .
(i.e.· · · )
4.2
Sensing Information
We combine all the first entries from every personality vector together to be a large vector g 1 ,
with gu1 = pu1 , we choose a microblog being forwarded for klog n times and try to establish the
family of transition vector txu (row vectors) firstly, the way to compute the transition vector is the
following:
A transition vector txu is a n−dimensional row vector. If u is the ith person to forward this
microblog x and the previous i − 1 persons are v1 , v2 , · · · , vi−1 , then we use a computing sequence
to find txu :
(txu )1 = (1, 0, 0, · · · , 0)
(txu )j+1
=
1
1+
Tvjj vj+1
(txu )j
+
Tvjj vj+1
1+
Tvjj vj+1
v
(0, 0, · · · , p1j+1 , 0, 0, · · · , 0)
(i.e. · · · )
for all 0 < j < i, where the unique nonzero entry is the (j + 1)th entry in the vector.
Then our sensing matrix is:
 x 
tu
 tx 
 v 
Sx =  . 
 .. 
txw
We use this sensing matrix to sense the vector g 1 to get a sensing information equality:
Sx g 1 = Ix1
3
(3)
Ix1 = (mxv1 , mxv2 , · · · , mxvi−1 , mxu , 0, 0, · · · , 0)T
4.3
Recovering
We use the standard linear programming method to recover g 1 from the sensing inequality
information.
Sx g 1 = Ix1
And we try log n more microblogs and compute the average vector of g 1 to be our final result.
Likely we can recover g 2 , g 3 · · · , and it remains only to rearrange them to get all the vectors.
4
Note on Satisfiability and Evolution
2014.08.13
The evolution model is the following. Suppose f is the fitness function of a genotype, an
n−variable function defined on {−1, 1}n , taking values in [1, 1+] where is a small enough constant
(this is due to weak selection). The frequency of genotypes follows a product distribution determined
by an n−dimensional vector u = (u1 , · · · , un ) (called the feature vector), ui ∈ [0, 1]. Let Pu be the
probability distribution on the cube {−1, 1}n such that
Y 1 1
xi
Pu (x) = Pu (x1 , · · · , xn ) =
+ u · (−1)
2 2
1≤i≤n
The frequency of genotype are evolving in generations according to Nagylaki’s Theorem. Mathematically speaking, let ut be the feature vector in tth generation, the recurrence equation is the
following. For all i,
ut+1
=
i
EX [Xi f (X)]
EX [f (X)]
where X follows the distribution Put (·).
We further explore the evolving phenomenon in the following sections. We first give the understanding of the evolution model from different perspectives, which helps us to make connection
between this novel problem and some other classical fields. Then we show some calculation and
experiment result and extract several basic facts about this evolution model.
1
1.1
Understanding From Different Perspectives
Center of Mass
In this section we consider the case that f only takes values in {1, 1 + }. Then all vertices of the
cube are divided into two sets: high set (including all vectors whose fitness is 1 + ) and low set
(including all vectors whose fitness is 1).
The evolution of genotypes could be characterized from perspective of center of mass in the
following way. Every time ut determine a product distribution on vertices of the cube, and the
probability can be considered as the weight on those vertices. Then we multiply the weight at every
high point by 1 + and do not change the weight at low points. After that we calculate the mass
of center in this system, by regarding probability as the mass at certain vertex, and let this point
be ut+1 .
To see things more clearly, every time when ut determine a product distribution on vertices of
the cube, we could calculate two centers called “high center” (ht ) and “low center” (lt ) where the
1
high center is the center of mass among all high points and the high center is the center of mass
among all low points. It is immediate that the high center, the low center and ut and ut+1 should
be on the same line. And it can be observed that:
ut+1 = ut + α(ht − lt )
where α is a small number related to .
We say ut is stable if ut+1 = ut . Another remark could be that ut is stable if and only if ht = lt .
However, if the pattern of f is not regular enough (1 and 1 + are distributed in a messy way),
it would be also hard to compute ht and lt every round. Thus, this understanding could not help
us out.
1.2
Coordination Game
We can also regard the evolution rule as a coordination game. We have n players and each player
has two action −1 and 1. One generation means one round of the game, and in each round every
1+ut 1−ut
player choose a mixed strategy ( 2 i , 2 i ).
Each player update his strategy in the following way: player i calculates the expected payoff
for every single choice in the last round: E[payoff | uti = 1] and E[payoff | uti = 1], where the
expectation is taking on other player’s randomness in the strategy. And he updates the probability
distribution in the next round so that:
1 + ut+1
(1 + uti )E[payoff|uti = 1]
i
=
(1 − uti )E[payoff|uti = −1]
1 − ut+1
i
This is a multiplicative weight updating rule for strategy updating. No much is known about the
monotonicity of the payoff when n players all update their strategies in the next round. However,
after a lot of numerical experiment, no counterexample is found. Thus, it is plausible that the
payoff is increasing under the updating rule.
For a stable point, we claim that it must be a mixed Nash Equilibrium of the game. This is
true since no player has the intention to change his current mixed strategy in order to get higher
payoff.
2
Monotonicity
By monotonicity we mean that f˜(ut+1 ) ≥ f˜(ut ) for all ut , where f˜ is the multi-linear extension of
f . We consider two special cases and analyze the monotonicity in these cases. And we conjecture
that the monotonicity holds in general.
There should be no limit in both the number of genes and the number of alleles in the general
cases. However, it is hard to prove or disprove the monotonicity in the general case.
Case 1. Two-Player, Two-Action, Relaxed Value of f
By relaxed value of f we mean f could take any value in [1, 1 + ] on vertices. In this case we
assume the payoff table is
a11 a12
(1)
a21 a22
2
By exactly computing the payoff in tth round and t + 1th round, we can prove that the monotonicity holds.
Case 2. N-Player, Two-Action, Constrained Value of f
In this case we have not proved the monotonicity, but we can understand the monotonicity in
another way: to put the evolution in a continuous form.
The frequency of genotypes evolve in generations, like game played in rounds. However, in reality
the evolution is a continuous procedure. We could regard the feature vector as a function of time
t, and the evolving rule becomes some kind of differential equation. Due to previous understanding
of center of mass, the direction that u(t) is going should be same with vector h(t) − l(t).
By calculation we obtain the following formula:
∂ f˜
2
=
· det
∂ui
(1 + ui )(1 − ui )
Hi+ Hi−
L+
L−
i
i
where Hi+ = Pr{x | xi = 1, f (x) = 1 + }, Hi− = Pr{x | xi = −1, f (x) = 1 + }, and L+
i is
defined similarly. Note that Hi+ does not have index t, but it is defined to be the summation of
certain probability at time t.
+
2
Hi Hi−
h(t) − l(t) =
− · det
L+
L−
(Hi+ + Hi− )(L+
i
i
i + Li )
Thus, we can see that the angle spanned by the gradient of f on u and the changing direction
of u is acute. which means if we change the evolution rule into a continuous one, we would prove
the monotonicity. This, however, guides us to believe that in this case monotonicity holds.
In general case, we believe monotonicity holds since a lot of numerical experiment is done and
no counterexample has been found.
3
Convergence
We analyze the convergence of the evolution rule from two aspects: the point that any initial u will
converge to and the convergence-path it will go. Unfortunately, no theoretical result is proved and
we rely on numerical experiment to get some conjecture.
3.1
Endpoint of Convergence
When f only takes constrained value at vertices, it is believed that almost all initial points in
[−1, 1]n will converge to some point with highest fitness (also known as satisfiability). Here almost
all means the lebesgue measure of all such point is 2n .
Here is the convergence diagram of all f when n = 2.
It can be observed that some initial point do not converge to a point with highest fitness, i.e.
there exists stable point with low fitness.
3
When f takes relaxed value at vertices, it is possible that with non-zero lebesgue measure, the
initial point will converge to a point with low fitness. The following two 2−dimensional convergence
diagram serve as examples.
4
3.2
Path of Convergence
The following two diagrams are the convergence path when n = 2.
5
6
Impossibility for Stack Mutual Exclusion
1
Problem and Conclusion
Consider the problem that 2 processes p0 , p1 want to implement Mutual Exclusion with only a
shared stack. Their legal operations on the stack are pop() and push(). When someone implements
pop() to the stack, the top entry of stack will be eliminated and the return value is the content in
this entry. When someone pushes an entry into the stack, this entry would be added into the stack
and be the new top entry, and push operation has no return value.
Given the above setting, we would like to show that there is no algorithm such that mutual
exclusion property and no starvation property are both satisfied. To be specific, we are going to
prove the following theorem.
Theorem 1.
For any algorithm that has mutual exclusion property and any initial state of stack, there exists
an infinite schedule s = (s1 , s2 , · · · ) such that every process implements infinite number of operations
in infinite time. That is, let I[sj = 0] be the indicator of event sj = 0, the following property is
satisfied:
i→∞
i
X
lim
i
X
lim
i→∞
I[sj = 0] = +∞
j=1
I[sj = 1] = +∞
j=1
Then this schedule will cause starvation, i.e. some process will be able to enter the critical
section only finitely many times.
2
Proof
Let the stack alphabet be Γ, and let ST (s) be the content of stack after the implementation of a
finite schedule s (ST (s) is an array). Assume every process runs a Turing machine and implements
according to its transition function.
1
Assume progress property is satisfied, i.e. there is some process that eventually entered into
the critical section (otherwise the theorem is already proved). Without loss of generality we have
a schedule s1 such that after implementations in s1 , p0 enters the critical section for the first time.
Now we consider |ST (s1 k 1n )| = w and let minstack(n) = inf k≥n |ST (s1 k 1k )|. It is immediate
that sequence {minstack1 (n)}∞
n=1 is non-decreasing. There are three cases for this sequence:
(1) limn→∞ minstack1 (n) = ∞
(2) limn→∞ minstack1 (n) = a > 0
(3) limn→∞ minstack1 (n) = 0
For these three cases, we prove that there exists an infinite schedule such that starvation happens
to at least one process, and the constructed sequence allows each process to implement infinitely
many times.
Case 1. limn→∞ minstack1 (n) = ∞
For every i ≥ |ST (s1 k 1n )| = w, let ri = arg minr≥0 minstack(r) = i. Consider the following
infinite schedule:
s1 k 1rw+1 0 k 1rw+2 0 k · · ·
It is observed that every process implements infinitely many times in the schedule. We claim
that in this schedule p1 will be starved, i.e. it cannot enter the critical section forever.
We compare the above constructed schedule with another schedule s1 k 1∗ . We now prove that
every return value of p1 ’s operation is exactly the same. This is because in the constructed schedule,
after the implementation of s1 k 1ri , p1 will never visit the entries that are currently in the stack
now, i.e. p1 will not pop any of these entries out due to the definition of ri . And the only operation
from p0 will not influence p1 to “live in his own world”. The next operation for p1 is surely some
push, and the pop operations afterwards only return what he have pushed from then on.
Thus, we claim that in the constructed sequence, p1 is not able to enter the critical section.
This is because in the schedule s1 k 1∗ , p1 cannot enter the critical section, otherwise the mutual
exclusion property is violated. And the same return value wills lead p1 to have same performance.
Case 2. limn→∞ minstack1 (n) = a > 0
This case is pretty similar with case 1. Since there are infinitely many r such that the size of
stack |ST (s1 k 1r )| = a, we let ri be such sequence of index in the increasing order, i.e. ri+1 > ri
for all i.
Like the previous case, we construct the following infinite schedule and claim that p1 will be
starved forever.
s1 k 1r1 0 k 1r2 0 k · · ·
The proof is exactly the same with case 1. We omit it here.
Case 3. limn→∞ minstack1 (n) = 0
Since there are infinitely many r such that the size of stack |ST (s1 k 1r )| = 0, we let ri be such
sequence of index in the increasing order, i.e. ri+1 > ri for all i.
Let p0 get out of the critical section after implementation of s1 k 1r1 , i.e. we consider the
sequence s1 k 1r1 k 0∗ .
2
• If process 0 cannot enter the critical section in this infinite schedule.
Consider the similar sequence {minstack0 (n)}∞
n=1 for process p0 . To be specific, let |ST (s1 k
1r1 )| = w0 and let minstack0 (n) = inf k≥n |ST (s1 k 1r1 k 0n )|. Three cases are similarly discussed as
the following:
(1) limn→∞ minstack0 (n) = ∞
The following schedule will cause p0 into starvation. (i ≥ |ST (s1 k 1n )| = w0 , let ri0 =
arg minr≥0 minstack(r) = i)
0
rw
0 +1
s1 k 1r1 k 0
0
rw
0 +2
1k0
1 k ···
(2) limn→∞ minstack0 (n) = b > 0
The following schedule will cause p0 into starvation. (Since there are infinitely many r such
that the size of stack |ST (s1 k 1r )| = a, we let ri0 be such sequence of index in the increasing order,
0
i.e. ri+1
> ri0 for all i.)
0
0
s1 k 1r1 k 0r1 1 k 0r2 1 k · · ·
(3) limn→∞ minstack0 (n) = 0.
The following schedule will cause both p0 and p1 into starvation. (Since there are infinitely
many r such that the size of stack |ST (s1 k 1r )| = a, we let ri0 be such sequence of index in the
0
increasing order, i.e. ri+1
> ri0 for all i.)
0
0
s1 k 1r1 k 0r1 k 1r2 k 0r2 · · ·
• If process 0 can enter the critical section in this infinite schedule.
That is, there exists some t1 such that after implementation of s1 k 1r1 k 0t1 , p0 enters the critical
section again. Then consider p1 starts to do implementation from now on, similar definitions for
minstack1 (n) are made and of course it is 0 (otherwise it can be reduced to case 1 or case 2 and
therefore proved.) Thus, there exists r2 such that |ST (s1 k 1r1 k 0t1 k 1r2 )| = 0. We let p0 get
out of critical section again and then do implementations on its own. Consider if he can enter the
critical section again this time. If not, it can be reduced to the previous dot subcase; if yes, we
continue to let p1 run again, · · ·
We obtain the following sequence that can cause p1 into starvation.
s1 k 1r1 k 0t1 k 1r2 k 0t2 · · ·
After implementation of s1 k 1r1 k 0t1 · · · || k 1ri k 0ti , p0 get into the critical section, and After
implementation of s1 k 1r1 k 0t1 · · · || k 1ri , the stack is empty.
Consequently, combining the analysis for all cases, we complete the proof of the theorem.
3
Notes on New Definitions for Differential Privacy
Zihan Tan
2014.06.10
1
Motivation
Differential Privacy has been a prevailing definition for privacy in recent years and was studied
through a lot of mechanisms. However, dimensionality curse always prevents a differentially private
mechanism from giving good performance. We observe that the differential privacy is a demanding
definition probably because it is an information-theoretic property. In this note we attempt to
develop a new computational-theoretic definition based on statistical distance for privacy and the
material is immature and several proofs are missing.
2
Statistical Distance
Definition 1. (Statistical Distance)
Let p and q be probability density function on R, then the statistical distance of them is defined
to be:
Z
1 ∞
s(p, q) :=
|p(x) − q(x)|dx
2 −∞
If the support of distribution is not R, we just do the integration on the support space.
The essence of statistical between two distributions is really clear: Consider you are given a
real number t ∈ R and are told that t is a sample from one of two distributions p and q. And then
you are asked to guess which distribution it was sampled from. The optimal strategy for you is to
compare p(t) and q(t) and if p(t) is larger then guess p and otherwise guess q. It can be computed
that the probability that your guess is right is exactly 21 + s(p, q).
Intuitively speaking, statistical distance in some sense represents the ability of an adversary to
distinguish between two distributions given an one-shot access. It can thus be used as definition
for differential privacy.
First we give the following definitions and propositions, which are not hard to prove and will
be essential for understanding the new privacy definition.
Definition 2. (Product Distribution)
Let p, q be two distributions on R, we define their product distribution p·q to be a distribution
on R2 , whose density function is p·q(x, y) = p(x)q(y).
Proposition 1. Let p, q be distribution on R, then
s(pm+n , q m+n ) ≤ s(pm , q m ) + s(pn , q n )
1
Proposition 2. Let p, q, r be distribution on R, then
s(p, q) ≤ s(p, r) + s(q, r)
3
Statistical Differential Privacy
We give the new definition of privacy based on statistical distance.
Definition 3. (Statistical Privacy)
A mechanism M is defined to be (t, p) − Statistically P rivate if for every neighboring databases
D, D0 , let M (D) and M (D0 ) be the distributions of outputs on input D and D0 respectively, for any
polynomial-time adversary, the probability that it passes the following experiment is not larger than
1
2 + p.
Experiment: The adversary is given t output samples (sampled from distribution M (D) or
M (D0 )) and asked to guess which one the data are sampled from. It passes the experiment if and
only if it guesses right.
In this new definition we are basically requiring that given any pair of neighboring database,
the ability for an poly-time adversary to distinguish the output distributions are bounded by p.
Some propositions and remarks are in order to better understand the new definition.
Remark 1. Note that when t = 1, the probability for an adversary(poly-time since it just need
to query M (D) and M (D0 ) once respectively) to guess right is just 12 + s(M (D), M (D0 )). This
means we need to compute the largest statistical distance between output distribution over any pair
of neighboring databases. And when t > 1, we will care about the statistical distance between M (D)t
and M (D0 )t , which are two product distributions.
Proposition 3. -differential privacy implies (1, 21 (e −1))-statistical privacy; but (1, p) statistical
privacy does not imply -differential privacy for any .
The intuition here is that good statistical privacy allows that for some x ∈ R, p(x) > 0 while
q(x) = 0. However, this is forbidden in differential privacy.
Proposition 4. (, δ)-differential privacy implies (1, δ + 12 (e −1))-statistical privacy, and (1, δ +
1 − − 2)-differential privacy.
2 (e −1)) statistical privacy implies (, 2δ + e + e
This proposition shows the equivalence of (, δ) differential privacy and statistical privacy in
some sense. However, the flavor of the order of parameters are different. For example it is often
the case that = O(1) and δ = O( n12 ), but this will cause e + e− − 2 = O(1), so it is not that
equivalent.
Proposition 5. (1, p)-statistical privacy implies (t, tp)-statistical privacy.
This is a direct corollary for proposition 1.
Here are some negative results about statistical privacy.
2
Remark 2. (Computational-Theoretic or not)
It seems that the new definition works for any poly-time adversary. But indeed we find a
characterization of statistical privacy from statistical distance, which means the poly-time constraint
for adversary here does not play a role. This shows that essentially our new definition is still
a information-theoretic one, and certain mechanisms cannot give us good performance when the
extrinsic dimension of data is large.
Remark 3. (Flavor of Parameters)
In definition of (, δ) differential privacy, usually we require = O(1) and δ = o( n1 ), from
proposition 4 we could only deduce a very weak statistical privacy property.
Besides, consider a mechanism that inputs D = {x1 , ·, xn } and outputs uniformly random one
xi . It is kind of a silly mechanism since with certain probability it just releases the whole privacy
of some client, but it satisfies (1, n1 ) privacy, which means when t = 1, p = n1 is not good enough.
However, if p = o( n1 ), for example p = n12 , we will see that we are actually restricting the mechanism
a lot so that it gives very similar outputs for all inputs, which in turn, leads to poor performance.
To be specific, for two complete different database D1 , D2 , according to proposition 2 the output
distributions of them should be of statistical distance at most n1 . This is generally not good for
accuracy of the mechanism.
4
Transportation Distance
Another well-known distance between probability distributions is transportation distance(a.k.a.
earth-mover distance and Wasserstein Metric). This could possibly also be transplanted as definition for privacy. However, it does not have good argument about its essence like statistical
distance.
It worth mentioning that transportation distance is defined over a coupling function, whose idea
could be probably utilized for privacy definition.
3
Notes on Approximability of Random Priority
2014.06.09
1
1.1
Problem and Algorithm
Facility Location
We have n clients and l facilities located on a d−dimensional space. Now we would like to assign
clients to the facilities so that they can get the service they want. FacilityPi has a capacity ci ,
indicating the largest number of clients that it could be assigned (assuming 1≤i≤l ci = n). However, the location of the clients is unknown to us, and will be reported by them after we set up
the assigning mechanism. Every clients wants to minimize the distance from his true location to
the facility that he is assigned to. We want to design a truthful mechanism M so that the social
welfare is maximized in some degree, where the social maximum is defined to be:
Social Welfare =
X
1≤i≤n
d(i, M(i))
Here M(i) represents the facility that i is matched under the mechanism M, and d(·, ·) is the
distance in the high-dimensional space.
1.2
Random Priority
We state the following mechanism called Random Priority (also known as Random Serial Dictatorship) without proving its truthfulness.
We uniformly choose a random permutation, and we let the clients report in the order chosen
by us. When we receive a reported location, we assign the nearest available facility to it. The
mechanism terminates until every client is matched.
2
An example
In this note we give a lower bound for approximation ratio for the Random Priority scheme. To be
specific, we proved the following theorem.
Theorem 1.
The guaranteed approximation ratio of RandomP riority mechanism is larger than O(n0.29 ).
Proof.
Consider the following example.
1
Suppose there are l facilities and n = 3l clients locating on a line (mostly at integer points as
you will see). Let fi be the number and ci of facilities located at point i respectively.
c1 = 1; c2 = 3 − 1; c22 = 32 − 31 , · · · , c2l = 3l − 3l−1
f− = 1; f2 = 3 − 1; f22 = 32 − 31 , · · · , f2l = 3l − 3l−1
For other i, ci = fi = 0, and is a sufficiently small number.
The optimal assignment for this example is straightforward, for t ≥ 2, we assign clients at t to
the facility at t, and we assign client at 1 to facility −. Thus, the total cost is 1 + .
We then compute the cost given by Random Priority Mechanism.
For the c1 + c2 = 3 clients at points 1 and 2, the probability that the client at 1 does not come
the last among 3 is 31 . Thus, the probability that it is assigned to facility at 2 is (1 − 13 ), which also
means one of clients at 2 would be located somewhere else.
P
Similar analysis for i ≥ 1: for the c1 + c2 + · · · + c2i clients at points 21 → 2i . Let si = ij=0 c2i ,
Q
sk
) = ( 23 )i
then the probability that there exists one that will be located at 2i+1 is ik=1 (1 − sk+1
By linearity of expectation, the expected cost given by Random Priority is:
l
X
i=0
2
2i · ( )i+1
3
This is approximated as ( 43 )l , and note that n is approximated as 3l , from the fact:
4
64
256
4
( )3 =
≤3≤
= ( )4
3
27
81
3
1
1
we know that the cost is between O(n 4 ) and O(n 3 ).
In the next section, we optimize the constant in the example to obtain the inapproximability
ratio in theorem.
3
Optimization of Example and Remarks
We use the notation defined in previous section. First we optimize the example as follows. Let
si
r = si−1
for all i ≥ 1. We can see that n = O(rl ), the expected cost given by Random Priority is
approximated as (2(1 − 1r ))l . We would like to find r such that
1
f (r) = logr 2(1 − )
r
is maximized, in this case the example with parameter r will give us an lower bound on approximation ratio as O(nf (r) ).
Since this function is a transcendental function, it is impossible to calculate analytical expression
of the optimizer r and the maximal value of f (r). By observing from the graph of the function we
obtain that maximizer is approximately 4.4 and optimal value is approximately 0.29.
2
We then give some vague remarks showing that in some sense this example is optimal. Remarks
are not rigorous but may give us some intuition in proving the upper bound or indicate the points
that we may find the breakthrough in constructing new examples.
Remark 1. Line
This example seems optimal in its line structure since for every metric space, the only nontrivial constraint for distance is triangle inequality. And the case for inequality to hold is that three
or points lie in a line. Thus the ”extreme” case for example might be of the line structure.
Remark 2. Parameters for Client Assignment
Back to our expression for cost:
cost = (1 −
s0
s0
s1
s0
s1
sl−1
) · 20 + (1 − )(1 − ) · 21 + · · · + (1 − )(1 − ) · · · (1 −
) · 2l
s1
s1
s2
s1
s2
sl
First, we need this series to be divergent, since a convergent series will give us a constant cost,
which is small when compared with n, and in turn not able to give us a good lower bound on
approximation ratio.
Second, when this series is divergent, it is often the case that the last term dominates the whole
s
l
series. In other words, it is of the same level of the sum. Consider (1 − ss01 )(1 − ss12 ) · · · (1 − l−1
sl ) · 2 ,
√ 2
it can be observed that (1 − a)(1 − b) ≤ (1 − ab) , then we tend to make the ratio between si and
si+1 about the same since s0 = 1 and sl = n are fixed.
Following these two observation, we come to the example mentioned in section 1. Although
n0.29 might not be the optimal lower bound for all truthful mechanisms, it seems like the optimal
lower bound we could obtain from Random Priority.
3
Machine Learning: Technical Report of Conway’s Inverse
Game of Life
Instructed by Liwei Wang
Due on Jan 14th, 2013
H. Jiachen, T. Zihan, H. Heping, Z. Zeyu
1
Formulation of the Problem
The Game of Life is a cellular automaton created by mathematician John Conway in 1970. The game consists of a
board of cells that are either on or off. One creates an initial configuration of these on/off states and observes how it
evolves. There are four simple rules to determine the next state of the game board, given the current state.
1.Any live cell with fewer than two live neighbors dies, as if caused by under-population.
2.Any live cell with two or three live neighbors lives on to the next generation.
3.Any live cell with more than three live neighbors dies, as if by overcrowding.
4.Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction.
2 Observations about Conway’s Reverse Game of Life
Three important observations are made after some experiment. These features will lead us to design better learning
algorithms.
• Locality
The state of a cell depends mostly on the origin configuration of its neighborhood, since the rule requires the
resulting state of next round is determined by the states of surrounding 9 cells in the current round. This locality
guides us to make decision based on neighboring configuration.
• Clustering
After some experiment of the game, an important is made: The more number of rounds is, the more clustering the
final configuration is. This is also the performance of lives. People tend to live together locally but not separately. This
feature to some extent characterize Conway’s Life game.
• Non-Uniqueness
Given an origin state, the proceeding algorithm is deterministic, but given a resulting state, the initial state is not
unique. Thus, finding a feasible solution is meaningless. Instead, those origin configuration with high likelihood is
what we need.
3 Ideas and Framework of Algorithm
3.1 Basic Notions
Some definitions are in order first to let the following algorithms and arguments make sense.
H. Jiachen, T. Zihan, H. Heping,Technical
Z. Zeyu Report of Conway’s Inverse Game of Life
Our data is generated in the following style: A random sampled configuration evolved 5 round, and the resulting
configuration is given as the “base” configuration, and our input is the resulting configuration of “base” configuration
after 5 round.
The following graph clarify our notions for certain configurations and cells:
Initial Configuration → Origin Configuration → Resulting Configuration
Initial State → Origin State → Resulting State
3.2 Stream of Algorithms
In this section, we introduce every algorithm that we have come up with in this project. Some of them have certain
flaws, and new ideas and construction are invented to overcome them.
• Local Perfect Trace Back
After some experiment, the clustering observation first came into our mind. Since the origin state is the resulting
state of a random sampled initial configuration after 5 rounds. It is stable in some measure, i.e. with good probability,
the origin state is well-clustered already. Thus, it will be helpful to find Local Perfect Trace Back, the specific
algorithm is the following:
First we do clustering of the resulting state, then for every cluster we design an algorithm to figure out a feasible
local origin configuration for it. Finally we combine these local configurations to be our guess of origin configuration.
Drawback:
Sometimes the performance of clustering is not that satisfying because the size of clustering is so large that it is
rather inefficient to find the local perfect track back. On the other hand, the Non-Uniqueness observation tells us that
finding a perfect track back is not that necessary and sometimes it has large deviation. Even if we get a perfect trace
back for the whole configuration from combining partial ones, some mutation will happen at the connection/intersection of some clusters so that the resulting configuration of our state is not always good.
• Local Likelihood for Single Guess
Following the Non-Uniqueness observation, likelihood of neighboring configuration is computed to help determine the state of a cell in origin configuration. Thus, given the resulting configuration, we focus on one cell and make
decision of its origin states based on the neighboring configuration.
What is local? A explicit size should be determined to make the algorithm work. However, there is a trade off
between the following two perspectives on deciding the size.
(1) Small size limits the accuracy of our knowledge of likelihood, and it will therefore influence the probability of
guessing right.
(2) Large size needs large space in storage, and when given an resulting state, our algorithm is indeed inefficient
for giving an output since it will search deeply into the data-tree otherwise the deep training is meaningless.
After some experiment we set t = 5, i.e. we decide the origin state of a cell based on the knowledge of the 5 × 5
configuration in the resulting configuration. t = 5 also makes our knowledge symmetric in every direction.
Drawback:
On one hand, the coverage of 5 × 5 configuration is not good, here the coverage is the fraction between number
of appeared configuration and the number of possible configurations (225 ), which makes our guess unavailable at
some input resulting configuration, this drawback can be overcome by changing the method of searching 5 × 5 local
configuration into developing a BFS style data-tree.
On the other hand, the following problem occurs too many time so that the general performance is not satisfying.
Page 2 of 6
H. Jiachen, T. Zihan, H. Heping,Technical
Z. Zeyu Report of Conway’s Inverse Game of Life
0.4 0.4
0.4 0.4
!
(1)
At some 2 × 2 local configuration, the likelihood is approximately 40 percent for each cell. However, according to
method of maximum likelihood we should guess “dead”, but the real case of origin local configuration is usually one
the following two:
1
0
0
1
!
0
1
1
0
!
(2)
This local error has huge impact on performance. It is because every time we make guess on a single cell but not
a local small configuration. Certain relation between neighboring cells is not taken into concern.
• Simulated Annealing for Local Likelihood for Single Guess
The first change we made for overcoming the lack of relation in Single Guess learning algorithm is Simulated
Annealing. We use standard Local Likelihood for Single Guess to get a guessing origin configuration, then evolve it
5 rounds to get a resulting state, then compare it with the input resulting state, add certain alive cells according to the
likelihood, use the new resulting state as the input of algorithm and run Local Likelihood for Single Guess again.
• Local Likelihood for Local Guess
Following the Clustering observation, and the drawback of Local Likelihood for Single Guess algorithm, it is
helpful to make decision on some local cells simultaneously rather than a single cell. Since it is observed that the
method maximum likelihood often give us four dead cell output even if the likelihood to be alive is about 40 percent
each, in which case it is of high probability that two of them are alive.
We first develop a database containing the likelihood of central cell to be alive for each configuration. They are
stored in a tree. Method of maximum likelihood is employed to made final decisions. And they also help to stop the
development of the data tree for the limit of storage. The details are listed below:
(1) Consider the limit of storage and the requirement of efficiency, we focus on deciding 2 × 2 local configuration
based on our knowledge on the surrounding 4 × 4 local configuration in the input resulting configuration.
(2) We compute the likelihood for every 2 × 2 possible outcome for local configuration, given that the resulting
4 × 4 local configuration is the input local configuration. Conditional probability method is used in guessing the origin
local configuration.
3.3 Training
The data tree is developed in BFS style. Every round a new layer is generated, which means a wider neighborhood
comes into consideration. When the likelihood goes beyond the threshold probability at some leaf node, it then stops
developing.
Example: w = 6, t = 2, D = 1000000maps, p = 0.42, f = 400
Page 3 of 6
H. Jiachen, T. Zihan, H. Heping,Technical
Z. Zeyu Report of Conway’s Inverse Game of Life
Algorithm 1: Train
Input: Search width w, train width t, training data D, probability threshold p, frequency threshold f
Output: Binary search tree
1 TreeHead.EndSearch ← f alse
2
2 for TreeDepth d ← 1 to w − 1 do
3
for All the w × w rectangles ri in the end maps of D do
4
Node ← TreeHead
5
c ← the center cell of ri
6
while Node.HaveSon & Node.Depth < d do
7
if c is dead then
8
Node ← Node.Left
9
10
11
12
13
14
15
16
17
18
19
20
21
22
else
Node ← Node.Right
c ← Next c (Spiral from the center of ri )
if Node.Depth = d & Node.EndSearch = f alse then
Node ← Node.Left/Right (Depends on c. If there is no son, new one)
ri0 ← the corresponding t × t rectangle in the start map (The rectangle is the center of ri )
Node.Config(ri0 ).Freq++
Node.Freq++
for Every leaf node Node where Node.Depth = d + 1 do
Node.EndSearch = true
2
for ri0 ∈ {0, 1}t do
Node.Config(ri0 ).Freq
if Node.Freq > f & |
− 21 | < p then
Node.Freq
Node.EndSearch = f alse
return TreeHead
Page 4 of 6
H. Jiachen, T. Zihan, H. Heping,Technical
Z. Zeyu Report of Conway’s Inverse Game of Life
3.4 Explicit Algorithm Box
Method of maximum likelihood is employed to make decisions on local cells. Data for computing is extracted from
our training tree.
Algorithm 2: Run
Input: End map E, search width w, TreeHead (with train width t), TreeHead’ (with train width 1), frequency
threshold f 0
Output: Start map S
1 for Each cell c in E do
2
c0 ← c
3
Node ← TreeHead’
4
while Node.HaveSon do
5
if c0 is dead then
6
Node ← Node.Left
else
Node ← Node.Right
7
8
9
10
11
12
13
14
15
16
17
if
c0 ← Next c0 (Spiral from the center c)
| Node.Config(1).Freq
Node.Freq
− 12 | > p0 then
S.Position(c) ← sgn( Node.Config(1).Freq
− 12 )
Node.Freq
else
Find all the t × t rectangles ri covering c
Use the method in line 2-10 (but using TreeHead and c0 ← the center of ri ) to find all the t × t
configration probabilities in ri
Calculate the most likely configuration of the (2t − 1) × (2t − 1) rectangle with center c
S.Position(c) ← the answer
return S
4 Experiment and Result
We tried three mechanisms by now. Here is the result.
We first tried generating the tree mentioned in the previous section. Due to the fact that the tree may have limitations in this growth speed, we first tried a small training set (that is 100 ∗ 1024 configurations per layer). The result
came out to be 12.35, and we can get the seventh at the time. Afterwards, we tried determining cells in block style,
that is, we consider the overall probability of a 2 ∗ 2 grid. Unfortunately, this works better than naive tree generating
mechanism only in the one-step case.
Mechanism \ Steps
Tree-Gen(1)
Blockwise
Tree-Gen(2)
1
9.99
9.68
9.51
2
12.21
12.24
11.79
3
12.96
13.04
12.66
4
13.24
13.37
13.10
5
13.36
13.48
13.29
Avg.
12.35
12.36
12.07
Table 1: Performance of three mechanisms we have tried.
Page 5 of 6
H. Jiachen, T. Zihan, H. Heping,Technical
Z. Zeyu Report of Conway’s Inverse Game of Life
More efforts are devoted into the intuition of simulated annealing and the use of conditional probability. Unfortunately, due to limited time we didn’t come up with a version which works well, so we decided to construct a bigger
tree using the first tree generation idea. This time we used 100 ∗ 16384 configurations per layer to construct the BST.
Although the size of the tree has been considerably large, performance seemed to be better. The result error rate turned
out to be 12.07 on my laptop, and in Kaggle 12.1038. We are now the 9th on the leaderboard.
A bigger tree may lead another leap in performance, but we have no enough computational resources to afford it.
Moreover, I think that there are still chance to significantly improve the performance of block style estimation.
Page 6 of 6