Supplement A to “Consistency of Sparse PCA in High Dimension, Low Sample Size Contexts” Dan Shena,1,∗, Haipeng Shena,2 , J.S. Marrona,3 a Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 Keywords: Sparse PCA, High Dimension, Low Sample Size, Consistency 1. Proofs of Lemma 7.1 and Theorems 3.1, 3.2 and 3.3 The proof of Lemma 7.1 is given in Section 1.1. The proofs of Theorems 3.1, 3.2 and 3.3 are shown in Sections 1.2, 1.3 and 1.4 respectively. 1.1. Proof of Lemma 7.1 Note that for every τ > 0 h i −1 P Cbd = P max1≤i≤bdβ c |ξi | > Cbdβ c τ (1.1) β c max1≤i≤bdβ c |ξi | > τ h i [ ≤ P max1≤i≤bdβ c ξi > Cbdβ c τ max1≤i≤bdβ c (−ξi ) > Cbdβ c τ ≤ P max1≤i≤bdβ c ξi > Cbdβ c τ + P max1≤i≤bdβ c (−ξi ) > Cbdβ c τ = 2P max1≤i≤bdβ c ξi > Cbdβ c τ β bd c n o \ 1 − ≤ 2 1 − P ξi δii 2 ≤ c(log(bdβ c))δ , i=1 ∗ Corresponding author Email addresses: dshen@live.unc.edu (Dan Shen), haipeng@email.unc.edu (Haipeng Shen), marron@email.unc.edu (J.S. Marron) 1 Partially supported by NSF grants DMS-0606577 and DMS-0854908. 2 Partially supported by NSF grants DMS-0606577, CMMI-0800575, and DMS-1106912. 3 Partially supported by NSF grants DMS-0606577 and DMS-0854908. Preprint submitted to Journal of Multivariate Analysis September 3, 2012 where c is a positive constant. Since β bd c X 1 − Φ c(log(bdβ c))δ −→ 0, as d → ∞, i=1 it then follows from Proposition 7.1 that β bd c n o \ 1 − P ξi δii 2 ≤ c(log(bdβ c))δ −→ 1, as d → ∞. (1.2) i=1 From (1.1) and (1.2), we can get p −1 Cbd − 0, as d → ∞. β c max1≤i≤bdβ c |ξi | → 1.2. Proof of Theorems 3.1 Assume that uˆ1 = u˘p1 /k˘ up1 k, and the entries of u˘p1 are given by u˘pi,1 = u˜i,1 1{|˜ui,1 |>ζ 0 } + ςi 1{|˜ui,1 |>ζ} , (1.3) 0 where u˜i,1 are defined in (2.1) of the main paper, and the expressions of ζ and ςi depend on the specific penalty function used in RSPCA. The following proof covers general penalties. More details are provided for the softthresholding, hard-thresholding, and SCAD penalty towards the end of this section. Denote u˘i,1 = u˜i,1 1{|˜ui,1 |>ζ 0 } , and it follows that u˘pi,1 = u˘i,1 + ςi 1{|˜ui,1 |>ζ} . (1.4) P β − 1 P β c p bd c p λ1 2 bd u ˘ u i=1 u˘i,1 ui,1 i,1 i,1 i=1 qP = . | < uˆ1 , u1 > | = qP − 12 d d p 2 p 2 ui,1 ) λ1 ui,1 ) i=1 (˘ i=1 (˘ (1.5) Note that Below we need to bound the denominator and the numerator of (1.5). 2 We start with the numerator. From (1.4), it follows that β β bd c X bdβ c bd c X 1 X 1 1 −2 −2 −2 p λ1 u˘i,1 ui,1 − cζλ1 |ui,1 | ≤ λ1 u˘i,1 ui,1 i=1 i=1 i=1 β bd c bdβ c X X − 21 − 21 ≤ λ1 u˘i,1 ui,1 + cζλ1 |ui,1 |. i=1 i=1 (1.6) 0 Since ζ satisfies the condition (d) in Theorem 2.2, it follows that u˘i,1 have the same property as the u˘i,1 appeared in the proofs of Theorems 2.2 and 2.3. Recall (7.9) and (7.12) from those proofs, which are displayed below: β bd c 0 1 X T min {κ,κ } −2 − ˜ 1 + op d 2 λ1 u˘i,1 ui,1 = v˜1 W , (1.7) i=1 and v u d 0 uX min {κ,κ } − 21 t T − ˜ 1 + op d 2 (˘ ui,1 )2 = v˜1 W , λ1 (1.8) i=1 0 κ +η−α 2 0 κ − 12 2 0 where κ ∈ [0, α − η − θ) and κ satisfies d ζ = o(1). 0 Pbdβ c κ −1 β 2 ζλ 2 d 2 ≤ Note that β ≤ η, which suggests that d ζλ1 |u | ≤ d i,1 1 i=1 cd 0 κ +η−α 2 ζ = o(1). It then follows that β −1 ζλ1 2 bd c X κ 0 |ui,1 | = o(d− 2 ). (1.9) i=1 Combining (1.6), (1.7) and (1.9), we have β bd c 0 X p T min {κ,κ } − 21 − ˜ 1 + op d 2 λ1 u˘i,1 ui,1 = v˜1 W . i=1 3 (1.10) Similarly for the denominator, we have v v u d u d uX 1 uX − 12 t − λ (˘ u )2 − cζλ 2 t 1 i,1 1 {|˜ ui,1 |>ζ} 1 i=1 v u d uX p − 12 t ≤ λ1 (˘ ui,1 )2 (1.11) i=1 i=1 v v u d u d uX 1 uX − 21 t − ≤ λ1 (˘ ui,1 )2 + cζλ1 2 t 1{|˜ui,1 |>ζ} . i=1 i=1 Next, we will show that v u d uX − 21 t ζλ 1 {|˜ ui,1 |>ζ} 1 0 − κ2 . = op d (1.12) i=1 For a fixed τ , observe that 0 P dκ ζ 2 λ−1 1 d X 1{|˜ui,1 |>ζ} ≥ τ i=bdβ c+1 ≤ d X 0 P 1{|˜ui,1 |>ζ} ≥ d−κ ζ −2 λ1 d−1 τ i=bdβ c+1 ≤ d n X X −1 P |hi,j σi,j | > cdθ ζ i=bdβ c+1 j=1 β Z ∞ ≤ n(d − bd c) cdθ ζ 2 1 x √ exp − dx −→ 0, as d → ∞, 2 2π which yields − 12 ζλ1 v u X 0 u d − κ2 t 1{|˜ui,1 |>ζ} = op d . (1.13) i=bdβ c+1 In addition, note that v v v ubdβ c u d u X d 1 uX 1u 1 uX −2 t −2 t −2 t ζλ1 1{|˜ui,1 |>ζ} ≤ ζλ1 1{|˜ui,1 |>ζ} + ζλ1 1{|˜ui,1 |>ζ} i=1 i=1 v u u − 21 β ≤ ζλ1 d 2 + tζ 2 λ−1 1 i=bdβ c+1 d X i=bdβ c+1 4 1{|˜ui,1 |>ζ} . (1.14) −1 β κ 0 (1.12) then follows from (1.13), (1.14) and the fact that ζλ1 2 d 2 = o(d− 2 ). Combing(1.8), (1.11), and (1.12), we have v u d 1 uX min{κ,κ0 } −2 t p 2 T ˜ − 2 (˘ ui,1 ) = |˜ v1 W1 | + op d λ1 . (1.15) i=1 Furthermore, (1.5), (1.10) and (1.15) suggest that min{κ,κ0 } − T ˜ 2 |˜ v1 W1 | + op d min{κ,κ0 } − 2 = 1 + op d , | < uˆ1 , u1 > | = min{κ,κ0 } − T ˜ 2 |˜ v1 W1 | + op d which means that uˆ1 is consistent with u1 with convergence rate d− 0 κ +η−α 2 min{κ,κ0 } 2 . α−η−κ 2 ζ = o(1). If ζ = o(d ), then we can take In addition, note that d κ 0 κ = κ. Therefore, uˆ1 is consistent with u1 with convergence rate d 2 . The above proof covers the three cases of using either the soft-thresholding or hard-thresholding or SCAD penalty in the RSPCA procedure, as discussed in Shen and Huang (2008) [3]. Below we provide more details for the corresponding estimator (1.3) of each penalty. • For the soft-thresholding penalty, let u˘soft = hsoft ˜1 ); then the 1 ζ (X(d) v soft entries of u˘1 are defined by u˘soft ˜i,1 1{|˜ui,1 |>ζ} − sign(˜ ui,1 )ζ1{|˜ui,1 |>ζ} . i,1 = u • For hard-thresholding, let u˘hard = hhard (X(d) v˜1 ); then the entries of 1 ζ hard u˘1 are u˘hard ˜i,1 1{|˜ui,1 |>ζ} . i,1 = u • Finally, for the SCAD penalty, let u˘SCAD = hSCAD (X(d) v˜1 ); then the 1 ζ SCAD entries u˘i,1 are u˘SCAD = u˜i,1 1{|˜ui,1 |>aζ} + ςi 1{|˜ui,1 |>ζ} , i,1 where ( sign(˜ ui,1 )(|˜ ui,1 | − ζ), ςi = (a−1)˜ui,1 −sign(˜ui,1 )aζ , a−2 5 if ζ < |˜ ui,1 | ≤ 2ζ, if 2ζ < |˜ ui,1 | ≤ aζ, with a > 2 being a tuning parameter as defined in Fan and Li (2001) [1]. Note that |ςi | < aζ. Since u˘soft uhard and u˘SCAD are just special cases of u˘p1 (1.3), it follows 1 ,˘ 1 1 that the corresponding RSPCA estimator uˆ1 (i.e., the normalized vector of α−η−κ u˘soft or u˘hard or u˘SCAD ) is consistent with u1 ; furthermore, if ζ = o(d 2 ), 1 1 1 the corresponding estimator is consistent with u1 with a convergence rate of κ d 2 . This finishes the proofs of Theorems 3.1. 1.3. Proof of Theorem 3.2 Firstly, we note that v˜1new = −1 −1 −1 −1 T T 2 λ1 2 X(d) (ˆ uold 1 − u1 ) + λ1 X(d) u1 (1.16) T T 2 (ˆ uold kλ1 2 X(d) 1 − u1 ) + λ1 X(d) u1 k −1 = T ˜ λ1 2 X(d) (ˆ uold 1 − u1 ) + W1 1 − T ˜ (ˆ uold kλ1 2 X(d) 1 − u1 ) + W1 k . p ˆ − 1, as d → ∞, where Jung P and Marron (2009) [2] showed that c−1 d λ1 → κ n α −1 2 uold cd = n 1 − u1 k = op (d ), i=1 λi ∼ d. In addition, since λ1 ∼ d and kˆ − 12 21 old ˆ kˆ where κ ≥ 1 − α, it follows that λ1 λ 1 u1 − u1 k = op (1). Therefore, we have −1 −1 1 T 2ˆ2 kλ1 2 X(d) (ˆ uold uold 1 − u1 )k ≤ nλ1 λ1 kˆ 1 − u1 k = op (1), which yields 1 − T ˜ ˜ λ1 2 X(d) (ˆ uold 1 − u1 ) + W1 = W1 + op (1), (1.17) and 1 1 −2 − T T ˜ ˜ (ˆ uold uold kλ1 2 X(d) 1 − u1 ) + W1 k ≤ kλ1 X(d) (ˆ 1 − u1 )k + kW1 k ˜ 1 k + op (1). = kW (1.18) Combining (1.16), (1.17) and (1.18), we have v˜1new = ˜ 1 + op (1) p W ˜1 W → − , as d → ∞. ˜ 1 k + op (1) ˜ 1k kW kW This finishes the proof of Theorem 3.2. 6 1.4. Proofs of Theorems 3.3 p ˜ W1 − kW From Theorem 3.2, we note that v˜1new → ˜ 1 k , as d → ∞. Hence, to prove Theorems 3.3, we can modify the proofs of Theorems 3.1 and 3.2, by and v˜1 with v˜1new in those proofs. replacing uˆ1 with u˜new 1 [1] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348–1360, 2001. [2] S. Jung and J.S. Marron. PCA consistency in high dimension. low sample size context. The Annals of Statistics, 37(6B):4104–4130, 2009. [3] H. Shen and J.Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99(6):1015–1034, 2008. 7
© Copyright 2024