Face Recognition with Only One Training Sample

Proceedings of the 25th Chinese Control Conference
7-11 August, 2006, Harbin, Heilongjiang
Face Recognition with Only One Training Sample
Chong Lu
1,2
, Wanquan Liu
2
and Senjian An
2
1. Dept. of Computer Science, YiLi Normal College, Yining, China 835000
E-mail: cluxjyn@gmail.com
2. Dept. of Computing, Curtin University of Technology, WA 6102
E-mail: {lu,wanquan,senjian}@curtin.edu.au
Abstract: In this paper, we compare the face recognition performance for five different methods with using only one training
sample. Firstly, we investigate the Singular Value Decomposition (SVD) of the face image and propose an augmenting algorithm via
using only one sample to generate a group of training samples. Then we implement the methods of face recognition with Discrete
Cosine Transform (DCT) and Two Dimensional Principal Component Analysis (2DPCA). Secondly, we implement face recognition
approach via DCT directly with one training sample. Thirdly, we primarily use DCT to generate some low-frequency matrices in
frequency domain and then converted into the spatial domain as independent training images. Then, 2DPCA will be used for face
recognition. Finally, we use DCT to generate some low-frequency matrices in frequency domain and use DCT to do face
recognition. Experiments on the AMP and Yale face database show that the approach DCT+2DPCA produces better results on the
AMP database. The approach SVD+2DPCA produces better result on Yale database.
Key Words: SVD, 2DPCA, DCT, face recognition, classification
1 INTRODUCTION
Face recognition has received extensive attention as one
of the most significant applications of image
understanding[1],[2],[3]. Research into face recognition
has flourished in recent years due to the increased need for
surveillance with more robust systems and attracted a
multidisciplinary research effort, in particular, for
techniques based on PCA [4],[5], 2DPCA [6]. Those
approaches usually use large and representative training
samples per person to enhance the recognition rate in the
circumstance of illumination, pose, facial expression,
make up, etc. However, large training samples can not be
guaranteed in practice, such as identity card verification,
passport verification, etc. In such situation, only one
frontal image per person captured under controlled
lighting conditions will be available for training. Some
face recognition algorithms have been proposed to solve
the face recognition problem with only a single training
image[7],[8]. SPCA [7] combines the original training
image with its derived image by perturbing the image
matrix's singular values and then performs PCA on the
training sample(s). Then DCT and 2DPCA are performed
on all the derived training images available. In order to
compare the results of experiments, we also performed five
type of experiments as detailed in this paper. Experiments
on the AMP and Yale face databases[9] show that
DCT+2DPCA produces much better results on the AMP
database. SVD+2DPCA and DCT+2DPCA produce better
results on Yale database.
2 FACE RECOGNITION ALGORITHM
MS Word Authors: please try to use the paragraph styles
contained in this document.
2.1 2DPCA
In this section, we briefly outline the standard procedure of
building the eigen vector space from a set of training
images. We represent input images as matrices
Ai ∈R m× n ,i=1,2…M , where m, n is the number of
pixels in the image, and M is the number of the images.
We adopt the following criterion as in [6]:
J ( X ) = tr ( S X ) = tr{ X T [E( A − EA)T ( A − EA)] X }
2
joined images. (PC) A [8] combines the original training
image with its vertical and horizontal projections and then
performs PCA on the enriched version of the image.
In this paper, we propose a algorithm after analyzing the
Singular Value Decomposition(SVD) face image. Firstly,
we combine the first one and several largest singular value
with their corresponding feature spaces to generate more
where S X is the covariance matrix of Ai (i=1,2...M)
with the projection matrix
of a stochastic variable.
In fact, the covariance matrix G ∈R
images is:
G=
IEEE Catalog Number: 06EX1310
X and E is the expectation
1
M
M
∑(A
i =1
j
m×m
with
− A M )T ( A j − A M )
M
where A M =
1
M
∑iM=1 Ai is the mean image matrix of the
M training samples. Alternatively the criterion in (1) can
be expressed by the following:
J ( X ) = tr ( X T GX )
where X is a unitary column vector. The matrix X that
maximizes the criterion is called the optimal projection
axis. The optimal projection X opt is a set of unitary
vectors that maximizes J(X). i.e. the eigenvectors of G
corresponding to the large eigenvalues. We usually choose
a subset of only d eigenvectors corresponding to larger
eigenvalues to be included in the model, that is
{X 1 … X d } = arg max J(x)
satisfying
X X j = 0 ( i ≠ j , i, j = 1, 2,Κ , d ). Each image
T
i
can thus be optimally approximated in the least-squares
sense up to a predefined reconstruction error. For face
recognition every input image A i will project into a point
in the d-dimensional subspace spanned by the selected
eigen matrix X [10].
When given a testing image B, we also project the matrix
DCT coefficients. Those facial features are more stable
than the variable high-frequency facial features.
This feature matrices contains the low-to-mid frequency
DCT coefficients, respectively. To recognize a particular
input face, the system compares this face's feature matrices
to the feature matrices of the database faces using a
Euclidean distance nearest-neighbor classifier [14],[15].
The Euclidean distance between one of the training set and
the test is
d = norm(A i − Bi ) (i=1,2,...n,)
where n is number of low-frequency matrices per test
A match is obtained by minimizing d.
2.3 SVD
m× n
denote face image, the SVD of A express
Let A ∈ R
[U,D,V] = svd(A) produces a diagonal matrix D of the
same dimension as A, with nonnegative diagonal elements
in decreasing order, and unitary matrices U and V so that
A = U ∗ D ∗ V T = U m× m ∗ Dm× n ∗ VnT× n [13],[7]. We
can approximate A by the following combination.
A1 = λ1 ∗ u 1 ∗ v1T + λ2 ∗ u2 ∗ v2T
A2 = λ1 ∗ u 1 ∗ v1T + λ2 ∗ u2 ∗ v2T + λ3 ∗ u3 ∗ v3T
of B into point in subspace with the eigen matrix X.
namely, YB = BX . Then a nearest neighbor classifier is
used for classification.
A3 = λ1 ∗ u1 ∗ v1T + λ2 ∗ u2 ∗ v2T + λ3 ∗ u3 ∗ v3T + λ4 ∗ u4 ∗ v4T
d (Yi0 , YB ) = min Yi − YB
where u i denotes the i column of U and
Where B is regarded to be the individual represented by
Ai0 .
transpose of the i column of V,
i
2.2 DCT
The DCT has been widely applied to solve numerous
problems among the digital signal processing community
[11]. The DCT of an m
C(u, v) =
× n image f(x,y) is defined by
n −1 m−1
2
⎡ (2x + 1)uπ ⎤ ⎡ (2 y + 1)vπ ⎤
a(u)a(v) ∑ ∑ f ( x, y) cos⎢
⎥ cos⎢ 2m ⎥
2n
mn
⎣
⎦ ⎣
⎦
x
y
λi
viT denotes the
denotes the diagonal
value of D, respectively. Therefore, we could produce
several images from one image in this way. Then , we
could use 2DPCA and DCT to do face classifiion based on
these derived images.
2.4 Face recognition
Based on the derived training samples, we can five
approaches to do face recognition as shown in the chart 1.
For u=0,1,...,n-1, v=0,1,...m-1, and the inverse transform is
defined by
f (x, y) =
2
mn
n−1 m−1
⎡ (2x +1)uπ ⎤ ⎡ (2 y +1)vπ ⎤
cos
2n ⎥⎦ ⎢⎣ 2m ⎥⎦
∑∑a(u)a(v)C(u, v) cos⎢⎣
u
v
For x=0,1,...,n-1, y=0,1,...m-1, a (ω ) =
1
2
for
ω=0
and a (ω ) = 1 otherwise.
For an m × n image, we have an m × n DCT
coefficient matrix covering all the spatial frequency
components of the image. The most significant facial
features such as eyes and mouth, hair and face outline can
be preserved by a very small number of low-frequency
Chart 1: Five approaches to do face recognition
Approach 1: We use SVD to obtain some new training
images A 1 , A 2 , A 3 from one sample A. Then, use DCT
to do face recognition, we denote this method as
SVD+DCT.
Approach 2: We use SVD to obtain some new training
images A 1 , A 2 , A 3 from one sample A. Then, use
2DPCA to do face recognition, we denote this method as
SVD+2DPCA.
Approach 3:We use DCT to transform the training sample
into frequency domain and then use DCT to do
classification, we denote this method as DCT directly.
Approach 4: We use DCT to generate some
low-frequency matrices such as 8 × 8 or 16 × 16
matrices and then converted into the spatial domain with
different low frequency blocks. Then, use 2DPCA to
classification, we denote this method as DCT+2DPCA.
Approach 5: We use DCT to generate some
low-frequency matrices, and use the defined low
frequency matrices to do classification, we denote this
method as DCT+DCT.
3 EXPERIMENTS RESULTS AND ANALYSIS
In this section, we will carry out several experiments to
demonstrate the performance the proposed five techniques.
As to the databases, we use two types of input images. One
is the AMP face database, which contains 975 images of 13
individuals(each person has 75 different images) under
various facial expressions and lighting conditions with
each images being cropped and resized to 64 × 64 pixels
in this experiment. The other database is the Yale face
database [8], which contains 640 face images of 10 people ,
including frontal views of face with different facial
expressions, lighting conditions. With these databases, we
will conduct five experiments. We just use the first imager
per person as training sample. The others are used as
testing.
Firstly, We use SVD to obtain some new training images
A 1 , A 2 , A 3 as training images from only one sample A.
Then, use DCT to do classification, the results in figure 1
and the best result is shown in table 1. And so does on the
Yale database, the results are shown in table 6 and figure 2.
than three new training images but only three new images
combined with the original images could get better result
on the AMP database in experiments, as shown in figure1
and figure 2.
Fourthly: We use DCT to generate some low-frequency
matrices such as 8 × 8 or 16 × 16 matrices and then
convert into the spatial domain. We combined three
converted spatial images with the original image to be
training samples. Then, use 2DPCA to do classification,
the results are shown in table 4. We may also obtain more
or less than two new training images but only two new
images which are 8 × 8 or 16 × 16 matrices combined
with the original image could get best result on the AMP
database in experiments. And so does on the Yale database,
as shown in table 8..
Finally: We use DCT to generate some low-frequency
matrices, and to do classification with these low frequency
blocks. The results are shown in table 5. We may also
obtain more or less than two new training images but only
two new images which are 8 × 8 or 16 × 16 matrices
combined with the original image could get best result on
the AMP database in experiments. And only two new
images which are 16 × 16 and 24 × 24 matrices
combined with the original image could get best result on
the Yale database. The results are shown in table 10.
Table1: SVD + DCT
(AMP)
Individual
1
2
3
4
5
6
7
True
74
73
74
74
59
47
74
False
0
1
0
0
15
27
0
Individual
8
9
10
11
12
13
total
True
74
74
74
74
74
74
919
False
0
0
0
0
0
0
43
Table2: SVD + 2DPCA
(AMP)
Individual
1
2
3
4
5
6
7
Secondly: We use SVD to obtain some new training
True
74
60
65
74
63
74
74
images A 1 , A 2 , A 3 from only one sample A. Then, use
2DPCA to do classification, the results are shown in table
2 and figure 1. We may also obtain more or less than three
new training images but only two new images combined
with the original image could get the best result on the
AMP database in our experiments. Only A1 combined
with the original image produces best result on the Yale
database. Shown in table 7 and figure 2.
False
0
14
9
0
11
0
0
Individual
8
9
10
11
12
13
total
True
54
74
74
74
72
74
901
False
20
0
0
0
2
0
57
Thirdly: We use DCT to transform the training images to
frequency domain as training sample and to do
classification based on this one training sample. The
results are shown in table 3 on the AMP database and table
8 on the Yale database. We may also obtain more or less
Table3: DCT directly
(AMP)
Individual
1
2
3
4
5
6
7
True
74
70
74
73
54
46
74
False
0
4
0
1
20
28
0
Individual
8
9
10
11
12
13
total
True
74
74
74
74
74
74
909
False
0
0
0
0
0
0
53
Table4: DCT + 2DPCA
(AMP)
Table2: SVD + 2DPCA
(Yale)
Individual
1
2
3
4
5
6
True
32
34
43
27
48
63
False
31
29
20
36
15
0
Individual
8
9
10
11
total
Individual
1
2
3
4
5
6
7
True
74
73
74
70
60
58
74
True
49
56
39
24
415
False
0
1
0
4
14
16
0
False
14
7
24
39
215
Individual
8
9
10
11
12
13
total
True
74
74
74
74
74
74
927
False
0
0
0
0
0
0
35
Table5: DCT + DCT
(AMP)
Individual
1
2
3
4
5
6
7
True
74
74
74
68
64
62
74
False
0
0
0
6
10
12
0
Individual
8
9
10
11
12
13
total
True
74
74
74
74
74
74
934
False
0
0
0
0
0
0
28
Table3: DCT directly
Individual
1
2
3
4
5
6
True
28
32
33
23
29
52
False
35
31
30
40
34
11
Individual
8
9
10
11
total
True
45
44
63
26
375
False
18
19
0
37
255
Table4: DCT + 2DPCA
1
(Yale)
Individual
1
2
3
4
5
6
True
28
34
31
25
32
39
False
35
29
32
38
31
24
Individual
8
9
10
11
total
True
56
46
63
32
386
False
7
17
0
31
244
figure1: Comparision of five approaches under the AMP database
0.98
(Yale)
0.96
Recognition accuracy
0.94
0.92
Table5: DCT + DCT
0.9
0.88
low-frequency DCT
SVD+DCT
SVD+2DPCA
DCT+2DPCA
DCT directly
0.86
0.84
0.82
0.8
0
0.5
1
1.5
2
2.5
3
Number of feature
Table1: SVD + DCT
3.5
4
4.5
5
(Yale)
Individual
1
2
3
4
5
6
True
28
32
32
23
29
52
False
35
31
31
40
34
11
Individual
8
9
10
11
total
True
44
44
63
25
372
False
19
19
0
38
258
(Yale)
Individual
1
2
3
4
5
6
True
27
33
32
23
29
52
False
36
30
31
40
34
11
Individual
8
9
10
11
total
True
43
44
63
26
372
False
20
19
0
37
258
figure2: Comparision of five approaches under the Yale database
Recognition accuracy
0.65
0.6
0.55
low-frequency DCT
SVD+DCT
SVD+2DPCA
DCT+2DPCA
DCT directly
0.5
0.45
0.4
0
0.5
1
1.5
2
2.5
3
Number of feature
3.5
4
4.5
5
From figure 1 and figure 2, we can see that DCT+2DPCA
is very stable in two databases. With large noise changing
in Yale database, SVD+2DPCA performs very good and it
can deal with different light illumination and expressions.
DCT+DCT works well in AMP database in which the
faces are quite easy to be recognized.
4 CONCLUSION
In this paper, we proposed how to derive more training
samples from SVD and DCT. To enhance the classification
performance with only one single training sample, each
original training image is combined with its reconstructed
images by our proposed algorithm, and then 2DPCA and
DCT are performed on the derived training images. We
also performed experiments on one training sample with
different schemes. Experiments on the AMP and Yale face
database show that DTC+2DPCA is quite stable due to a
fact that it combines the frequency domain knowledge
with the special domain technique. Further, the
performance based on the derived training samples is
improved to that based on only one training sample.
REFERENCES
[1] W.Zhao, R. Chellappa, A. Rosenfeld, and P.J.Phillips, "Face
Recognition: A Literature Survey", Technical Report
CAR-TR-948,Univ. of Maryland,CfAR,2000.
[2] Lei Zhang and Dimitris Samaras, "Face Recognition from a
single Training Image under Arbitrary Unknown Lighting
Using Spherical Harmonics", Transactions on Pattern
Analysis and Machine Intelligence, Vol. 28, No. 3,March
2006.
[3] Ronny Tjahyadi,wanquan Liu and Svetha Venkatesh,
"Automatic Parameter selection for Eigenface". Proceeding
of 6th International Conference on Optimization Techniques
and Applications (ICOTA 2004).
[4] M.A.Grudin, OnInternal Representations in Face
Recognition Systems, Pattern Recognition, vol.33, no.7,
1161-1177, 2000.
[5] L. Zhao and Y.Yang, Theoretical Analysis of Illumination
in PCA-Based Vision Systems, Pattern Recognition,
vol.32,no.4, 547-564, 1999.
[6] Jian Yang et al. "Two-Dimentional PCA: a new approach of
2DPCA to appearance-based face representation and
recognition", IEEE Tran. Pattern Analysisand Machine
Intelligence, vol.26, no.1, 131-137, Jan.2004.
[7] D.Zhang, S. Chen, and Z.H.Zhou, "A new face recognition
method based on SVD perturbation for single example
image per person". Applied Mathematics and computation,
163(2) 895-907. 2005.
[8] J. Wu, Z.H. Zhou, "Face recognition with one training
image per person", Pattern Recognition Letters, 23(14)
1711-1719. 2002.
[9] ftp//plucky.cs.yale.edu/CVC/pub/images/yalefaceB/Tarsets
[10] L. Sirovichand M.Kirby, “Low-Dimensional Procedure for
Characterization of Human Faces”, J.Optical Soc.Am., vol.4,
519-524,1987.
[11] Ahmed, N., Natarajan, T., and Rao, K. Discrete cosine
transform. IEEE Trans. on Computers, 23(1): 90—93, 1974.
[12] Z. Pan, A.G.Rust, and H. Bolouri, "image redundancy
reduction for neural network classification using discrete
cosine transforms", IEEE Neural Networks vol 3, 149-154.
2000.
[13] J. J. Gerbrands. "On the relationships between SVD, KLT
and PCA". Pattern recognition 14:375-381. 1981.
[14] Duda, R.O. and Hart, P.E. "Pattern Classification and
SceneAnalysis". Wiley: New York, NY.1973.
[15] Ziad M. Hafed and Martin D. Levine. "Face Recognition
Using the Discrete Cosine Transform". International Journal
of Computer Vision 43(3), 167--188, 2001.