Supplement A to “Consistency of Sparse PCA in High Dan Shen

Supplement A to “Consistency of Sparse PCA in High
Dimension, Low Sample Size Contexts”
Dan Shena,1,∗, Haipeng Shena,2 , J.S. Marrona,3
a
Department of Statistics and Operations Research, University of North Carolina at
Chapel Hill, Chapel Hill, NC 27599
Keywords: Sparse PCA, High Dimension, Low Sample Size, Consistency
1. Proofs of Lemma 7.1 and Theorems 3.1, 3.2 and 3.3
The proof of Lemma 7.1 is given in Section 1.1. The proofs of Theorems
3.1, 3.2 and 3.3 are shown in Sections 1.2, 1.3 and 1.4 respectively.
1.1. Proof of Lemma 7.1
Note that for every τ > 0
h
i
−1
P Cbd
= P max1≤i≤bdβ c |ξi | > Cbdβ c τ
(1.1)
β c max1≤i≤bdβ c |ξi | > τ
h
i
[
≤ P max1≤i≤bdβ c ξi > Cbdβ c τ
max1≤i≤bdβ c (−ξi ) > Cbdβ c τ
≤ P max1≤i≤bdβ c ξi > Cbdβ c τ + P max1≤i≤bdβ c (−ξi ) > Cbdβ c τ
= 2P max1≤i≤bdβ c ξi > Cbdβ c τ

 β

bd c n
o
\
1
−
≤ 2 1 − P 
ξi δii 2 ≤ c(log(bdβ c))δ  ,
i=1
∗
Corresponding author
Email addresses: dshen@live.unc.edu (Dan Shen), haipeng@email.unc.edu
(Haipeng Shen), marron@email.unc.edu (J.S. Marron)
1
Partially supported by NSF grants DMS-0606577 and DMS-0854908.
2
Partially supported by NSF grants DMS-0606577, CMMI-0800575, and DMS-1106912.
3
Partially supported by NSF grants DMS-0606577 and DMS-0854908.
Preprint submitted to Journal of Multivariate Analysis
September 3, 2012
where c is a positive constant. Since
β
bd c
X
1 − Φ c(log(bdβ c))δ
−→ 0, as d → ∞,
i=1
it then follows from Proposition 7.1 that
 β

bd c n
o
\
1
−
P
ξi δii 2 ≤ c(log(bdβ c))δ  −→ 1, as d → ∞.
(1.2)
i=1
From (1.1) and (1.2), we can get
p
−1
Cbd
− 0, as d → ∞.
β c max1≤i≤bdβ c |ξi | →
1.2. Proof of Theorems 3.1
Assume that uˆ1 = u˘p1 /k˘
up1 k, and the entries of u˘p1 are given by
u˘pi,1 = u˜i,1 1{|˜ui,1 |>ζ 0 } + ςi 1{|˜ui,1 |>ζ} ,
(1.3)
0
where u˜i,1 are defined in (2.1) of the main paper, and the expressions of ζ
and ςi depend on the specific penalty function used in RSPCA. The following proof covers general penalties. More details are provided for the softthresholding, hard-thresholding, and SCAD penalty towards the end of this
section.
Denote u˘i,1 = u˜i,1 1{|˜ui,1 |>ζ 0 } , and it follows that
u˘pi,1 = u˘i,1 + ςi 1{|˜ui,1 |>ζ} .
(1.4)
P β
− 1 P β c p
bd c p
λ1 2 bd
u
˘
u
i=1 u˘i,1 ui,1 i,1 i,1 i=1
qP
=
.
| < uˆ1 , u1 > | = qP
− 12
d
d
p 2
p 2
ui,1 )
λ1
ui,1 )
i=1 (˘
i=1 (˘
(1.5)
Note that
Below we need to bound the denominator and the numerator of (1.5).
2
We start with the numerator. From (1.4), it follows that
β
β
bd c
X
bdβ c
bd c
X
1 X
1
1
−2
−2
−2 p
λ1 u˘i,1 ui,1 − cζλ1
|ui,1 | ≤ λ1 u˘i,1 ui,1 i=1
i=1
i=1
β
bd c
bdβ c
X
X
− 21 − 21
≤ λ1 u˘i,1 ui,1 + cζλ1
|ui,1 |.
i=1
i=1
(1.6)
0
Since ζ satisfies the condition (d) in Theorem 2.2, it follows that u˘i,1 have
the same property as the u˘i,1 appeared in the proofs of Theorems 2.2 and
2.3. Recall (7.9) and (7.12) from those proofs, which are displayed below:
β
bd c
0 1 X
T min
{κ,κ }
−2
−
˜ 1 + op d
2
λ1 u˘i,1 ui,1 = v˜1 W
,
(1.7)
i=1
and
v
u d
0 uX
min
{κ,κ }
− 21 t
T
−
˜ 1 + op d
2
(˘
ui,1 )2 = v˜1 W
,
λ1
(1.8)
i=1
0
κ +η−α
2
0
κ
− 12
2
0
where κ ∈ [0, α − η − θ) and κ satisfies d
ζ = o(1).
0
Pbdβ c
κ
−1 β
2 ζλ 2 d 2 ≤
Note that β ≤ η, which suggests that d ζλ1
|u
|
≤
d
i,1
1
i=1
cd
0
κ +η−α
2
ζ = o(1). It then follows that
β
−1
ζλ1 2
bd c
X
κ
0
|ui,1 | = o(d− 2 ).
(1.9)
i=1
Combining (1.6), (1.7) and (1.9), we have
β
bd c
0 X p
T min
{κ,κ }
− 21 −
˜ 1 + op d
2
λ1 u˘i,1 ui,1 = v˜1 W
.
i=1
3
(1.10)
Similarly for the denominator, we have
v
v
u d
u d
uX
1 uX
− 12 t
−
λ
(˘
u )2 − cζλ 2 t
1
i,1
1
{|˜
ui,1 |>ζ}
1
i=1
v
u d
uX p
− 12 t
≤ λ1
(˘
ui,1 )2 (1.11)
i=1
i=1
v
v
u d
u d
uX
1 uX
− 21 t
−
≤ λ1
(˘
ui,1 )2 + cζλ1 2 t
1{|˜ui,1 |>ζ} .
i=1
i=1
Next, we will show that
v
u d
uX
− 21 t
ζλ
1
{|˜
ui,1 |>ζ}
1
0
− κ2
.
= op d
(1.12)
i=1
For a fixed τ , observe that

0
P dκ ζ 2 λ−1
1

d
X
1{|˜ui,1 |>ζ} ≥ τ 
i=bdβ c+1
≤
d
X
0
P 1{|˜ui,1 |>ζ} ≥ d−κ ζ −2 λ1 d−1 τ
i=bdβ c+1
≤
d
n
X
X
−1
P |hi,j σi,j
| > cdθ ζ
i=bdβ c+1 j=1
β
Z
∞
≤ n(d − bd c)
cdθ ζ
2
1
x
√ exp −
dx −→ 0, as d → ∞,
2
2π
which yields
− 12
ζλ1
v
u X
0
u d
− κ2
t
1{|˜ui,1 |>ζ} = op d
.
(1.13)
i=bdβ c+1
In addition, note that
v
v
v
ubdβ c
u d
u X
d
1 uX
1u
1 uX
−2 t
−2 t
−2
t
ζλ1
1{|˜ui,1 |>ζ} ≤ ζλ1
1{|˜ui,1 |>ζ} + ζλ1
1{|˜ui,1 |>ζ}
i=1
i=1
v
u
u
− 21 β
≤ ζλ1 d 2 + tζ 2 λ−1
1
i=bdβ c+1
d
X
i=bdβ c+1
4
1{|˜ui,1 |>ζ} .
(1.14)
−1
β
κ
0
(1.12) then follows from (1.13), (1.14) and the fact that ζλ1 2 d 2 = o(d− 2 ).
Combing(1.8), (1.11), and (1.12), we have
v
u d
1 uX
min{κ,κ0 }
−2 t
p 2
T ˜
−
2
(˘
ui,1 ) = |˜
v1 W1 | + op d
λ1
.
(1.15)
i=1
Furthermore, (1.5), (1.10) and (1.15) suggest that
min{κ,κ0 }
−
T ˜
2
|˜
v1 W1 | + op d
min{κ,κ0 }
−
2
= 1 + op d
,
| < uˆ1 , u1 > | =
min{κ,κ0 }
−
T ˜
2
|˜
v1 W1 | + op d
which means that uˆ1 is consistent with u1 with convergence rate d−
0
κ +η−α
2
min{κ,κ0 }
2
.
α−η−κ
2
ζ = o(1). If ζ = o(d
), then we can take
In addition, note that d
κ
0
κ = κ. Therefore, uˆ1 is consistent with u1 with convergence rate d 2 .
The above proof covers the three cases of using either the soft-thresholding
or hard-thresholding or SCAD penalty in the RSPCA procedure, as discussed
in Shen and Huang (2008) [3]. Below we provide more details for the corresponding estimator (1.3) of each penalty.
• For the soft-thresholding penalty, let u˘soft
= hsoft
˜1 ); then the
1
ζ (X(d) v
soft
entries of u˘1 are defined by
u˘soft
˜i,1 1{|˜ui,1 |>ζ} − sign(˜
ui,1 )ζ1{|˜ui,1 |>ζ} .
i,1 = u
• For hard-thresholding, let u˘hard
= hhard
(X(d) v˜1 ); then the entries of
1
ζ
hard
u˘1 are
u˘hard
˜i,1 1{|˜ui,1 |>ζ} .
i,1 = u
• Finally, for the SCAD penalty, let u˘SCAD
= hSCAD
(X(d) v˜1 ); then the
1
ζ
SCAD
entries u˘i,1
are
u˘SCAD
= u˜i,1 1{|˜ui,1 |>aζ} + ςi 1{|˜ui,1 |>ζ} ,
i,1
where
(
sign(˜
ui,1 )(|˜
ui,1 | − ζ),
ςi = (a−1)˜ui,1 −sign(˜ui,1 )aζ
,
a−2
5
if ζ < |˜
ui,1 | ≤ 2ζ,
if 2ζ < |˜
ui,1 | ≤ aζ,
with a > 2 being a tuning parameter as defined in Fan and Li (2001) [1].
Note that |ςi | < aζ.
Since u˘soft
uhard
and u˘SCAD
are just special cases of u˘p1 (1.3), it follows
1 ,˘
1
1
that the corresponding RSPCA estimator uˆ1 (i.e., the normalized vector of
α−η−κ
u˘soft
or u˘hard
or u˘SCAD
) is consistent with u1 ; furthermore, if ζ = o(d 2 ),
1
1
1
the corresponding estimator is consistent with u1 with a convergence rate of
κ
d 2 . This finishes the proofs of Theorems 3.1.
1.3. Proof of Theorem 3.2
Firstly, we note that
v˜1new =
−1
−1
−1
−1
T
T
2
λ1 2 X(d)
(ˆ
uold
1 − u1 ) + λ1 X(d) u1
(1.16)
T
T
2
(ˆ
uold
kλ1 2 X(d)
1 − u1 ) + λ1 X(d) u1 k
−1
=
T
˜
λ1 2 X(d)
(ˆ
uold
1 − u1 ) + W1
1
−
T
˜
(ˆ
uold
kλ1 2 X(d)
1 − u1 ) + W1 k
.
p
ˆ − 1, as d → ∞, where
Jung P
and Marron (2009) [2] showed that c−1
d λ1 →
κ
n
α
−1
2
uold
cd = n
1 − u1 k = op (d ),
i=1 λi ∼ d. In addition, since λ1 ∼ d and kˆ
− 12 21
old
ˆ kˆ
where κ ≥ 1 − α, it follows that λ1 λ
1 u1 − u1 k = op (1).
Therefore, we have
−1
−1
1
T
2ˆ2
kλ1 2 X(d)
(ˆ
uold
uold
1 − u1 )k ≤ nλ1 λ1 kˆ
1 − u1 k = op (1),
which yields
1
−
T
˜
˜
λ1 2 X(d)
(ˆ
uold
1 − u1 ) + W1 = W1 + op (1),
(1.17)
and
1
1
−2
−
T
T
˜
˜
(ˆ
uold
uold
kλ1 2 X(d)
1 − u1 ) + W1 k ≤ kλ1 X(d) (ˆ
1 − u1 )k + kW1 k
˜ 1 k + op (1).
= kW
(1.18)
Combining (1.16), (1.17) and (1.18), we have
v˜1new =
˜ 1 + op (1) p W
˜1
W
→
−
, as d → ∞.
˜ 1 k + op (1)
˜ 1k
kW
kW
This finishes the proof of Theorem 3.2.
6
1.4. Proofs of Theorems 3.3
p
˜
W1
− kW
From Theorem 3.2, we note that v˜1new →
˜ 1 k , as d → ∞. Hence, to
prove Theorems 3.3, we can modify the proofs of Theorems 3.1 and 3.2, by
and v˜1 with v˜1new in those proofs.
replacing uˆ1 with u˜new
1
[1] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood
and its oracle properties. Journal of the American Statistical Association,
96(456):1348–1360, 2001.
[2] S. Jung and J.S. Marron. PCA consistency in high dimension. low sample
size context. The Annals of Statistics, 37(6B):4104–4130, 2009.
[3] H. Shen and J.Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis,
99(6):1015–1034, 2008.
7