Lecture 6 Solving SDPs using Multiplicative Weights∗ In this lecture, first we propose an algorithm to solve semidefinite programs and then we will apply it to MAXCUT problem as an example. As you will see, we need an oracle with specific properties for our method to work, so we will show how to build such an oracle for MAXCUT problem. Finally, we investigate the quality of the SDP relaxation for a more general cases of discrete quadratic programs. 6.1 Semidefinite Programming As we saw in lecture 04, the canonical form for a SDP is sup B • X s.t. Ai • X ≤ ci X<0 i = 1, · · · , m Recall that we also added the assumption that A1 = I and c1 = R, implying that T race(X) ≤ R for any feasible X. The dual problem associated to this SDP is inf cT y s.t. m X yi A i − B < 0 i=1 y≥0 Now suppose that we have the following oracle which is going to help up in our algorithm. Oracle: Given X < 0 and δ > 0, return either * Lecturer: Thomas Vidick. Scribe: Ehsan Abbasi. 1 (i) X is primal feasible with objective value greater than or equal to (1 − δ)α, or P P (ii) y ∈ Rm such that y ≥ 0, X • ( i yi Ai − B) ≥ 0, k i yi Ai − Bk ≤ σ and cT y ≤ α. Recall that σ is called the width of the oracle. Assuming such an oracle is given to us we introduced the following algorithm. δα 2σR (t) P y (t) −B+σI Algorithm: Run MMWA using = and loss of M = i i 2σ (thus 0 4 M (t) 4 I) by starting from X (0) = n1 I. At each step, update X using the MMW update rule from the previous lecture. In this algorithm, if the oracle returns (i) we’re done. Otherwise we have the following theorem. 2 2 Theorem 6.1. the oracle does not fail (in returning y) for T = 8σδ2 αR2 log n iterations then P If (t) 1 δα y¯ = R e1 + T t y is dual feasible with objective value at most (1 + δ)α. Proof. First let’s check the guarantee on the objective value cT y = δα T 1 X T (t) c e1 + c y ≤ δα + α = (1 + δ)α, R T i where we used c1 = R and the guarantee cT y (t) ≤ α for any y (t) returned by the oracle. Now we should check feasibility. Due to oracle we know y ≥ 0. We should now check ? P i yi Ai − B ≥ 0. From P the(t)MMW theorem we know that for any unit vector v, and in particular the eigenvector of t M associated with its smallest eigenvalue, 0≤ T X M (t) •X (t) (1) ≤ (1 + ) T X t=1 t=1 (2) = (1 + )λn ( T X t=1 P i v T M (t) v + log n (t) X X (t) yi Ai − B + σI σ log n (3) 1 + log n yi )Ai − T B) + )+ = λn ( ( + , 2σ 2σ 2σ t i where λn denotes the smallest eigenvalue. Here (2) holds by our choice of v, and (3) uses properties of eigenvalues of PSD matrices, specifically λi ((A + bI)/c) = (λi (A) + b)/c for any b and any c > 0. Rearranging terms, X 1X X −1 log n 2σ δα (4) − ) ≤ λn ( ( yi ) − B) = λn ( y¯i Ai − B) − 2 (1 + )T T t R i i X −4σ log n δα ⇒ ≤ λn ( y¯i Ai − B) − , (1 + )T R i ( where (4) is because of the way we defined y¯i in the theorem. Given the choice of parameters 4σ log n made in the theorem you can check that δα − (1+)T > 0, so the smallest eigenvalue of R P ¯i Ai − B is positive, meaning this is a PSD matrix and y¯ is feasible, as required. iy 2 6.2 Application to the MAXCUT problem In this part, we use the MMWA algorithm to solve the MAXCUT SDP introduced in lecture 04. We saw that for a given undirected graph G = (V, E), assuming G is d-regular (i.e. each vertex has degree exactly d), the size of the largest cut could be written as MAXCUT(G) = 1 X 1X |E| |E| + sup xi xj ≤ + sup Ai,j ui .uj , 2 2 xi ∈{±1} 2 ui ∈R2n 2 i,j (i,j)∈E kui k=1 where A is the (symmetrized) adjacency matrix of the graph G, which has 1/2 for every entry (i, j) and (j, i) associated to an edge {i, j} and zeros elsewhere. This problem can be written in standard form in the following way: MAXCUT(G) = sup B • X s.t. Ei • X ≤ 1 X < 0, where B = d4 I + A2 and Ei is a matrix whose ith diagonal entry is one and the others are zero. Using that it has at most d non-zero entries, each equal to 1/2, in every row, the adjacency matrix satisfies kAk ≤ d/2, thus we have kBk ≤ d2 and B < 0. Observations: If α is the optimal value for the SDP we have |E| = nd ≤ α ≤ |E| = nd . 2 4 2 The first inequality follows since there is always a cut of sie |E|/2 (a random cut will cut half the edges), and the second follows from the bound on the norm of B. Now our goal is to design an oracle O that we can use for the algorithm that we proposed to solve SDP using MMWA. In other words, given X <P 0 such that Tr(X) = n (n plays the T role of R in our algorithm), findPy ≥ 0 such that c y = i yi ≤ α (in this problem all of the entries of c are ones) and X • ( i yi Ei − B) < 0. We design the oracle by distinguishing the following cases: P First Case: If B • X ≤ α, let yi = αn ≥ 0 for all i. Then cT y = i yi = n αn = α ≤ α. Besides, X X α X •( yi Ei − B) = yi Xii − X • B = Tr(X) − X • B = α − X • B ≥ 0. n i i Second Case: Suppose B • X = λα > α (λ > 1). We also have λ ≤ 2 because B • X ≤ ||B||Tr(X) = d2 n ≤ 2α. We already know that B • X > α, so if X is feasible then we have case (i) for the oracle and we are done: we found a very good feasible solution. Otherwise define S = {i : Xii > λ} X K= Xii i∈S 3 S is the set of indices whose constraint is violated by a large amount (since λ > 1), and K is the sum of violated diagonal entries of X. Now consider the following two cases: If K > δλn , then let 4 λα if i ∈ S, K yi = 0 if i ∈ / S. Then obviously y ≥ 0 and cT y = X yi = i (5) λα K λα |S| ≤ = α, K K λ where (5) holds since K ≥ λ|S| from the way we defined K and S. Besides, X X λα λα X •( Xii − X • B = K − X • B = λα − X • B = 0. yi Ei − B) = K K i i∈S n), assume Finally in the other case when there are only a few constraints violated (K ≤ δλ 4 without loss of generality (permuting the rows and columns if necessary) that the first |S| diagonal entries of X correspond to those i ∈ S, so we can write XS,S XS,S¯ X= . XS,S XS, ¯ ¯ S¯ ¯ as Now define a new matrix X ¯= X 0 1 X ¯ S¯ . λ S, 0 0 ¯ < 0. Besides X ¯ ii ≤ 1 for every A diagonal block extracted from a PSD matrix is PSD so X ¯ is primal feasible. It remains to evaluate its objective value. i, thus X Claim: B • X ≥ (1 − 3δ)α. From the definition of λ, (7) 3 X 1 1 X 1 ¯ (6) α − B • X = B • ( X − X) = Bi,j Xi,j = Bi,j kui kkuj k ≤ dkui k λ λ i,j∈T λ λ i∈S sX (10) 3d K (11) 3δnd (12) (8) 3d p (9) 3d p √ ≤ √ ≤ 3δα, ≤ |S| kui k2 = |S|K ≤ λ λ λ 4 n λ i∈S ¯ ∪ (S¯ × S), (7) is because kuj k ≤ 1 where in (6) we introduced T = (S × S) ∪ (S × S) and T consists of three sets and summation for each set is less P that d. (8) is a result of 2 the Cauchy-Schwarz inequality, (9) is because kui k = Xii and i∈S Xii = K. (10) uses K > λ|S|. (11) follows from our assumption K ≤ δλ n and (12) is because n4 ≤ α. Thus 4 ¯ ≥ (1 − 3δ)α. finally we have B • X So the oracle works. How good is it? First note that it runs very fast. We have only three cases to distinguish between, and in 4 each one we check a linear constraint. Thus the running time is linear in the number of edges m of the graph. Next we need to bound the width of the oracle. y 0 1 X d ... yi Ei − B ≤ + kBk ≤ max |yi | + . i 2 i 0 yn In order to find maxi |yi | we should check all of the cases. For example for the case when n, we have K ≥ δλ 4 λα 4 nd 4d λi ≤ ≤ = K λδn 2 δ d d Thus maxi yi v δ and width v δ . 6.3 General quadratic programs Consider the following problem α= X sup Ai,j xi yj xi ,yj ∈{±1} i,j i=1,...,n j=1,...,m where A ∈ Rn×m . Just like the MAXCUT problem this is an NP-Hard problem (as you’ll show in homework, MAXCUT is a special case). We will see how a good approximation can be obtained in polynomial time. For this we propose the following relaxation: X α≤β= sup Ai,j ui · vj ui ,vj ∈Rm+n i,j kui k=kvj k=1 It is not obvious that this program is SDP and we will get back to it later. The interesting point here is the following theorem. Theorem 6.2. Given ui ’s and vj ’s achieving the optimal in the above SDP, there exists a polynomial-time algorithm that produces xi ’s and yj ’s in {±1} such that X Ai,j xi yj ≥ Cβ, i,j where C is a universal constant. There are different methods to prove the theorem, which yield different values of C for this theorem. For example in your homework you will develop an algorithm to achieve 5 C ≈= 0.56. The best value for C is called Grothendieck’s Constant KG and can be defined as X X KG = inf C : ∀m, n ∀A ∈ Rn×m , sup Ai,j ui · vj ≤ KG sup Ai,j xi yj i,j i,j Now let’s rewrite the above problem as an SDP in the form of sup B • Z s.t. Ai • Z ≤ ci Z<0 If ui ’s are columns of U and vj ’s are columns of V, then define [ui · uj ] [ui · vj ] T Z = (U V ) (U V ) = ∈ R(m+n)×(m+n) . [vi · uj ] [vi · vj ] Trivially Z is a PSD matrix (it is a Gram matrix), and its diagonal elements are the squared norms kui k2 and kvj k2 , which should be at most one. Thus we let ci = 1 and Ai = Ei for i = 1, . . . , n + m, where Ei is a matrix whose ith diagonal entry is one and the others are zeros. Finally for the objective value, we define 1 0 A , B= 2 AT 0 and this problem is equivalent to the relaxed problem that we introduced for our original quadratic optimization problem, except now it is in standard SDP form. 6
© Copyright 2025