Model-Based Training Set Synthesis for Vector Quantization

Proceedings of the IASTED International Conference on
Signal and Image Processing
October 18-21, 1999, Nassau, Bahamas
Model-Based Training Set Synthesis for Vector Quantization
Dorin Comaniciu
Department of Electrical and Computer Engineering
Rutgers University, Piscataway, NJ 08854-8058, USA
comanici@caip.rutgers.edu
Abstract
codebook optimization takes into account the particular statistics of the input.
The organization of the paper is as follows. Section
2 denes the Vector Quantization with Training Set
Synthesis (VQ-TSS) paradigm. The implementation
of VQ-TSS in the transform domain is discussed in
Section 3. Experimental results and comparisons are
given in Section 4.
We propose an adaptive vector quantization scheme
based on the statistical modeling of AC cosine coefcients with mixtures of Gaussian distributions. The
model parameters are used to synthesize training vector sets whose underlying distribution resembles that
of transformed data. Since the model parameters are
also sent to the decoding side, both the encoder and
decoder can derive the same training set and codebook.
Experiments with several test images showed that the
codebooks obtained from synthesized data are eective for the vector quantization of transformed data,
the entire procedure resulting in high quality image
compression.
2 Vector Quantization with Training
Set Synthesis
The block diagram of a vector quantizer which uses
training set synthesis, is shown in Figure 1.
Keywords:
Image Coding, Training Set Synthesis, Transform Vector Quantization, ExpectationMaximization.
(a)
1 Introduction
The nonstationary nature of image data often determines a signicant statistical dierence between the
image being coded and the training set the codebook
was designed for. Even for the case of large training
sets of vectors, the input structure might not be reected in the current codebook, which determines important distortions. On the other side, an adaptive
codebook typically requires large amounts of side information for the transmission of new codewords [1].
The synthesis of the training set [2] has been recently described as a method to indirectly specify an
adaptive codebook. The idea is to t a statistical
model to input data, estimate the model parameters,
and use them to synthesize a training vector set which
approximates the input. Thus, by sending the model
parameters to the decoding side, the decoder can derive the adaptive codebook. Overall, the technique results in low bit rate encoding with good reconstruction
quality. A simple model based on one-dimensional histograms was employed in [2].
In this paper we improve the training set synthesis
by modeling the density of AC cosine coecients with
mixtures of Gaussian distributions. We show that the
codebooks obtained from synthesized data are eective for the vector quantization of transformed data.
High quality image compression is obtained since the
296-161
(b)
Figure 1: Vector quantization with training set synthesis. (a) Encoding side. (b) Decoding side.
At the encoding side (Figure 1a) the input data is
rst tted to a statistical model. The best-t parameters, named training set parameters (TSP), are used
to synthesize a training set (TS) with statistics similar
to the input. The codebook C , populated according
to the generalized Lloyd algorithm (GLA) [3], is then
employed to vector quantize the input data. Only the
set of codeword indices I and the TSP are stored or
transmitted.
Figure 1b presents the decoding side. The received
TSP are used to synthesize the TS which is further employed to derive the codebook. An approximate reconstruction of the original data is nally obtained based
on the indices I and the codebook C .
The VQ-TSS advantage is that very few side information, represented by the TSP, has to be transmitted. Thus, the complete codebook adaptation is accomplished with only a small increase in the bit rate.
1
3 Implementation
3, respectively) to limit the size of the corresponding
codebook. Recall that the size of a codebook depends
exponentially on the number of bits allocated for a certain vector, which is equal to the sum of all allocations
received by the vector components.
We describe and analyze below the DCT domain implementation of the proposed method, called transform
VQ-TSS (TVQ-TSS). The modeling is less complex in
the transform domain, where the coecients are (almost) decorrelated and have typically highly peaked
histograms centered around zero. Note that the numerical and graphical examples in this section correspond
to the 512 512 gray level image Lena.
Table 1: The estimates after 100 EM iterations of the
a priori probabilities, means, and square root of the
variances (standard deviations) corresponding to the
rst two AC coecients of the highest energy class of
image Lena. (a) Coecient AC1 . (b) Coecient AC2 .
3.1 Transform Block Classication and
Bit Allocation Scheme
Let us consider the DCT of the B B blocks representing the input image. Following the usual notation
we denote the coecient in the upper-left corner of a
block as the DC coecient, while the remaining coefcients are named AC coecients.
To increase the adaptation, we use the procedure
described in [4] which classies the transformed blocks
into nC equally populated classes according to the energy of AC coecients. The overhead information in
bits/pixeldue to block classication is
2
nC > 1
RBC = (0b; log2 (nC ; 1)c + 1) =B ; ifotherwise
(1)
where bc is the down-rounded integer.
A bit allocation scheme that gives real and positive
bit rates is derived in [5] by supposing that each vector
component is optimally encoded (in the distortion-rate
sense). The overall distortion is minimized subject to
the positive allocation restriction and an imposed bit
rate RAC . The scheme assigns to the AC coecient
in position (u; v), belonging to class c, and having the
variance c2 (u; v), a number of bits equal to
(
1
m 2
3
4
1
m 2
3
4
2
1m
1m
P2m
2m
2m
0.304 -71.649 146.186
0.218 -112.055 60.971
0.133 123.358 31.939
0.343 104.668 163.452
(a)
0.196 14.206 164.537
0.145 -22.670 33.696
0.088 5.819 14.701
0.569 3.116 90.617
(b)
3.3 Modeling the AC Coecients
Most of the methods in literature assume that
the AC coecients are statistically independent and
model them with Laplacian distribution [6], Gaussian
or Gamma [7], Generalized Gaussian [8], or Mixture of
Gaussian Distributions (MGD) [9].
We further employ the MGD model, which captures
the input statistics better than models relying on one
elementary distribution. According to MGD, if x =
(x1 ; : : : ; xk )> is a vector of AC coecients resulting
from the above vector formation, the PDF of its j th
component is given by
)
)
2
log2 c (u;v
; if 0 < c (u; v )
Rc (u; v) =
0
otherwise
2
= max 0; 21 log2 c (u; v)
(2)
where is the solution of
1 X log c2 (u; v) = R :
(3)
AC
2 2 (u;v) 2 1
2
P1m
fj (x) =
Mj
X
m=1
Pjm gim (x); j = 1; : : : ; k;
(4)
where Mj is the number of Gaussians employed in modeling and gjm is the Gaussian distribution having a pri2
ori probability Pjm , mean jm , and variance jm
, with
PMj
m=1 Pjm = 1.
maximum-likelihood
estimates
of
The
2
Pjm ; jm ; jm
with m = 1; : : : ; Mj , are part
of the training set parameters and are obtained by
dierentiating the logarithm of likelihood function.
The iterative procedure that solves the likelihood
equations is the expectation maximization (EM)
algorithm [10].
The derivation of the best value for Mj (in the
maximum likelihood sense) requires multiple runs of
the EM algorithm, which induces additional complexity. Therefore, we have o-line selected Mj as being
equal to the number of Gaussians that maximize the
c
We compute the bit allocation for each vector of DCT
coecients by summing the values derived in equation
(2) and rounding the result.
3.2 Vector Formation
Each energy class is treated separately after the
allocation of bits. The DCT block is decomposed
into the DC coecient and 17 vectors taken in zigzag
order and denoted by v1 = (AC1 ; AC2 )> , v2 =
(AC3 ; AC4 ; AC5 )> ,. . . , v17 = (AC61 ; AC62 ; AC63 )> .
The maximum vector dimension is kmax = 4, due to efciency constraints resulting from the GLA algorithm
(we explain this limitation in Section 3.4). In addition, the rst two vectors have lower dimension (2 and
2
compression performance. For most of the images we
tested, Mj = M = 4 proved to be a good solution.
The estimated parameters corresponding to the rst
and second AC coecients of the highest energy class
of image Lena are shown in Table 1a and Table 1b,
respectively.
Figure 2 shows the PDFs of the same coecients derived with equation (4). In the gure, we compare the
MGD result with estimates obtained through nonparametric analysis with the optimal Epanechnikov kernel
[11]. The two curves are very close to each other. In
addition, the PDF of the rst coecient is bimodal
and asymmetric, which justies the modeling based on
a mixture of distributions.
The joint PDF corresponding to the vector v1 =
(AC1 ; AC2 )> of the highest energy class is presented
in Figure 3a. For comparison, Figure 3b shows the
2-dimensional Epanechnikov density estimate of the
same data. The two surfaces have the same global
features, each exhibiting two signicant modes.
−5
x 10
Normalized density
1.5
1
0.5
−3
3
x 10
MGD
Epanechnikov
0
−400
Normalized density
2.5
−200
500
0
200
2
AC 2
1.5
0
400
−500
AC 1
(a)
−5
x 10
1
1.5
0
−600
−400
−200
0
200
Coefficient value
400
600
Normalized density
0.5
800
(a)
−3
7
x 10
1
0.5
MGD
Epanechnikov
6
0
−400
Normalized density
5
−200
500
0
200
4
AC 2
2
1
0
Coefficient value
500
(b)
Figure 2: The PDFs corresponding to parameters from
Table 1 and the PDFs derived through nonparametric
analysis with optimal kernel of window width h = 30.
(a) Coecient AC1 . (b) Coecient AC2 .
3.4 Training Set Synthesis and Codebook
Generation
The joint PDF of a vector x = (x1 ; : : : ; xk )> whose
components are assumed to be statistically independent is equal to the product of marginal densities.
With the marginal densities modeled according to (4),
the joint PDF of x is given by
f (x) =
k
Y
j =1
fj (x) =
k X
M
Y
j =1 m=1
Pjm gjm (x):
400
−500
AC 1
(b)
Figure 3: The joint PDF of the vector v1 =
(AC1 ; AC2 )> of the highest energy class derived from
image Lena. (a) MGD model. (b) 2-dimensional
Epanechnikov estimate.
To generate a training set whose underlying distribution is approximated by (5), we uniformly sample
the space covered by x using a k-dimensional cubic
lattice fxq gq=1:L with minimum point separation .
Then, we associate to each lattice point xq the weight
f (xq ). If [xj;min ; xj;max ] is the range of values for the
j th component of x, then the number of samples for
dimension j is lj = b(xj;max ; xj;min )=c. The number of lattice points L is the product of the number of
samples for each dimension
3
0
−500
0
L=
k
Y
j =1
b(xj;max ; xj;min )=c:
(6)
To reduce the error caused by sampling, the value of
should be small. However, equation (6) shows that
the number of lattice points is inversely proportional to
(5)
3
the kth power of . Since the number of lattice points
L and the vector dimensionality k determine the speed
of the GLA algorithm for codebook generation, we limited their values to Lmax = 50; 000 and kmax = 4,
which induced an overall compression/decompression
time of only a few seconds.
The lattice points and their weights constitute the
training set used as input to the GLA algorithm. An efcient prediction-based implementation of GLA can be
achieved by taking into account that the lattice points
form an ordered and uniformly spaced set. Thus, there
is a high probability that two lattice points with successive indices are allocated to the same codeword. The
search for the closest codeword to the current lattice
point can therefore be performed in a small neighborhood of the codeword associated with the previous lattice point.
(a)
(b)
Figure 4: Block diagram of TVQ-TSS compression. (a)
Encoder. (b) Decoder.
3.5 Encoder and Decoder Schemes
4 Experimental Results
Figure 4a presents the block diagram of the TVQTSS encoder which assumes the following operations:
The input image is partitioned into 8 8 blocks
and 2-dimensional DCT is computed for each
block.
The DC coecient is uniformly quantized and the
resulting values are DPCM encoded and transmitted.
The transformed blocks are classied into 4
equally populated classes and the bits are allocated according to Section 3.1. The bit allocation
map and the class indices are transmitted to the
decoder.
The TSP are estimated and the codebooks derived
as described in Section 3.3 and 3.4. The vector
quantization of the transform data yields the set of
codeword indices I which are transmitted together
with the TSP.
Finally, error analysis and reduction is performed.
The largest E errors are considered and their positions inside the DCT blocks are coded and transmitted. The error reduction is achieved using 2
correcting values (one positive and one negative)
for all errors. Each DCT block has a one-bit ag
that shows whether inside the block corrections
are operated or not. Two additional bits are required for each correction. One indicates the sign
of correction. The other shows whether the next
correction belongs to the same block as the current
correction.
Figure 4b presents the block diagram of the TVQTSS decoder. The processing starts with training set
synthesis based on the received TSP, followed by codebook derivation and decoding of the codeword indices
I. The error information is then used to reduce a selected set of errors. The inverse transformation of the
corrected data produces an approximated replica of the
original image.
We tested the new compression method on a Sun
Ultra 60 Workstation (C implementation). The images used for testing are available via anonymous ftp
to whitechapel.media.mit.edu under /pub/testimages.
They are all 512 512 pixel monochrome still images
with 256 gray levels.
A rst set of results is presented in Table 2 containing the peak signal to noise ratios (PSNRs) of the
images in the test set after compression/decompression
at 0:28 bits/pixel. The coding time for one image was
less then 5 seconds while the decoding took about 3
seconds. Figure 5 shows the encoded image Lena at
0:25 and 0:5 bits/pixel, respectively.
Table 2: Coding performance for TVQ-TSS at a bit
rate of 0:28 bits/pixel.
Image PSNR (dB) Image PSNR (dB)
Al
32.87
Goldhill
30.08
Aero
29.65
Jet
30.85
Baboon
23.13
Lena
32.51
Bank
28.02
Loco
25.75
Barbara
26.59
London
32.25
Boat
30.11
Oleh
32.93
Couple
38.88
Pyramid
31.98
Einstein
34.14
Regan
32.05
Face
31.54
Wedding
30.57
Girl
33.94
Zelda
35.95
The PSNR-based comparisons presented in Figure 6
show that TVQ-TSS performance is better that that
of JPEG standard. The improvement in PSNR is almost 1 dB for the Lena image. One can also observe
that (for the same image) our method performs better
than other three recent techniques which employ vector
quantization of the DCT coecients (classied VQ in
the transform domain [12], VQ with variable block-size
[13], and additive vector decoding of transform coded
4
35.5
35
34.5
PSNR (dB)
34
33.5
33
32.5
TVQ−TSS
JPEG
Ref [29]
Ref [30]
Ref [50]
32
31.5
31
30.5
0.2
0.25
0.3
0.35
0.4
Bit rate (bits/pixel)
0.45
0.5
0.55
Figure 6: Coding performance for image Lena.
References
[1] M. Lightstone and S.K. Mitra, \Image-adaptive vector
quantization in an entropy-constrained framework",
IEEE Trans. Image Process., Vol. 6, 1997, 441-450.
[2] D. Comaniciu, \Training Set Synthesis for EntropyConstrained Transform Vector Quantization", Proc.
IEEE ICASSP, Atlanta, Vol. 4, 2036-2039, 1996.
[3] Y. Linde, A. Buzo, and R.M. Gray, \An algorithm for
vector quantizer design", IEEE Trans. Commun., Vol.
COM-28, 1980, 84-95.
[4] W.H. Chen and H. Smith, \Adaptive coding of
monochrome and color images", IEEE Trans. on
Commun., Vol. COM-25, 1977, 1285-1292.
[5] A. Segall, \Bit allocation and encoding for vector
sources", IEEE Trans. Inform. Theory, Vol. IT-22,
1976, 162-169.
[6] R.C. Reininger and J. Gibson, \Distribution of the
two-dimensional DCT coecients for images", IEEE
Trans. Commun., Vol. COM-31, 1983, 835-839.
[7] A.N. Netravali and B.G. Haskell, Digital Pictures,
Representation and Compression, Plenum Press, New
York, 1989.
[8] K.A. Birney and T.R. Fisher, \On the modeling of
DCT and subband image data for compression", IEEE
Trans. Image Process., Vol. 4, 1995, pp. 186-193.
[9] D. Comaniciu, R. Grisel, and F. Astrade, \Medical image compression using mixture distributions and optimal quantizers", Proc. IASTED ICSIP, Las Vegas,
1995, 89-92.
[10] R.A. Redner and H.F. Walker, \Mixture densities,
maximum likelihood and the EM algorithm", SIAM
Review, Vol. 26, 1984, 195-239.
[11] D. Comaniciu and P. Meer, \Distribution free decomposition of multivariate data", Pattern Analysis and
Applications, Vol. 2, No. 1, 1999, 22-30.
[12] J.W. Kim and S.U. Lee, \A transform domain classier vector quantizer for image coding", IEEE Trans.
Circuits Syst. Video Technol., No. 2, 1992, pp. 3-14.
[13] M.H. Lee and G. Crebbin, \Classied vector quantization with variable block-size DCT models", IEE Proc.
Vis. Image Sig. Process., Vol. 141, 1994, pp. 39-48.
[14] S.W. Wu and A. Gersho, \Additive vector decoding
of transform coded images", IEEE Trans. Image Processing, Vol. 7, 1998, pp. 794-803.
[15] Z.M. Yusof and T.R. Fisher, \An entropy-coded lattice vector quantizer for transform and subband image
coding", IEEE Trans. Image Processing, Vol. 5, 1996,
289-298.
(a)
(b)
Figure 5: TVQ-TSS results: image Lena. (a) 0:25
bits/pixel, 31:84 dB. (b) 0:5 bits/pixel, 34:92 dB.
images [14]). The computational complexity is higher
for TVQ-TSS, however, as we mentioned before, the
processing time is only a few seconds on a standard
workstation. Is is worth noting that TVQ-TSS (which
is based on a xed-rate allocation scheme) performs
very close to techniques based on variable rate VQ (see
for example the entropy-coded lattice vector quantizer
reported in [15]).
5 Conclusions
This paper introduced a new method to achieve
codebook adaptation to the input statistics with a
small amount of side information. We presented a
transform domain implementation of Vector Quantization with Training Set Synthesis and showed that
its performance is competitive with other compression
schemes based on VQ and transform coding.
5